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^ Abstract 

<N 

O j We explore two fundamental questions at the intersection of sampling theory and information theory: 

jyy how is channel capacity affected by sampHng below the channel's Nyquist rate, and what sub-Nyquist 

l/^ sampling strategy should be employed to maximize capacity. In particular, we first derive the capacity 

of sampled analog channels for two prevalent sampling mechanisms: filtering followed by sampling 
and sampling following filter banks. Connections between sampling and MIMO Gaussian channels are 
^ illuminated based on this analysis. Optimal prefilters that maximize capacity are identified for both cases, 

, as well as several kinds of channels for which these sampling mechanisms are optimal to maximize 

, capacity at sub-Nyquist rates. We also highlight connections between sampled analog channel capacity 

^ and minimum mean squared error estimation from sampled data. In particular, it is shown that for both 

^-H filtering and filter-bank sampling strategies, the filters maximizing capacity and minimizing mean squared 

1/^ error are equivalent. We also investigate a more general sampling strategy by adding modulation banks 

^-H applied in both theory and practice. We also show a connection between this general sampling method 

• • and MIMO Gaussian channels. We then identify the optimal sampling strategy for piece- wise flat sampled 

channels to be a single branch of modulation and filtering. These results demonstrate the tradeoffs between 
channel capacity and sampling rate, illustrate the interplay between sampling techniques and capacity of 
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sampled analog channels, and identify a simple optimal sampling strategy to maximize capacity for a 
large class of channels. 

Index Terms 

sampling rate, channel capacity, sampled analog channels, sub-Nyquist sampling 



I. Introduction 

The capacity of continuous-time waveform channels and the corresponding capacity-achieving water- 
filling power allocation strategy over frequency are well-known ||T|, and provide much insight and 
performance targets for practical communication system design. These results implicitly assume sampling 
above the Nyquist rate at the receiver end. However, channels that are not bandlimited have an infinite 
Nyquist rate and, hence, cannot be sampled at that rate. Moreover, hardware and power limitations often 
preclude sampling at the Nyquist rate of bandlimited channels, especially for wideband communication 
systems. This gives rise to a natural question at the intersection of sampling theory and information 
theory: how much information, in the Shannon sense, can be conveyed through undersampled analog 
channels. There are two types of fundamental capacity metrics that are of interests: (1) the capacity for 
a given sampling mechanism, which incorporates the sampling mechanism into the channel model; (2) 
the capacity for a fixed sampling rate when optimizing over a general class of sampling mechanisms at 
that rate. The second metric requires that the sampling mechanism be optimized relative to capacity. In 
this paper, we study several sampling mechanisms of increasing complexity, and investigate the interplay 
between capacity and these sampling strategies. In particular, for each of these sampling strategies, we 
determine the capacity as a function of the sampling rate, and identify classes of channels for which the 
sampling strategy is optimal at any given sampling rate below the Nyquist rate. 



A. Related Work 

The derivation of the capacity of linear time-invariant (LTI) waveform channels was pioneered by 
Shannon et. al. ||3|. Making use of the asymptotic spectral properties of Toeplitz operators [4| or, 
alternatively, Fourier analysis |5|, this capacity result established the optimality of a water- filling power 
allocation based on signal-to-noise ratio (SNR) across the frequency domain |1J, which has motivated 
practical power and bit loading in multicarrier communications [6]. The Shannon framework has also been 
extended to wideband fading channels |[7|-|[9|, multiple-input-multiple-output (MIMO) channels [[T0|- 
[ [T2| , and non-coherent channels |[T3|-|[T5|. On the other hand, the Shannon-Nyquist sampling theorem. 
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which dictates that channel capacity is preserved when the received signal is sampled at or above the 
Nyquist rate since no information is lost, has been used since the early days of information theory to 
transform analog channels into their discrete counterparts, e.g. |jT6|. Forney et. al. 117] surveys minimum- 
bandwidth orthogonal pulse amplitude modulation (PAM) techniques for serial transmission over linear 
Gaussian channels, which allows the lossless conversion between analog and digital channels through 
Nyquist-rate sampling. This paradigm of discretization has also been employed by Medard et. al to 
bound the maximum mutual information in time- varying channels ||8|, [ [T3| . However, all of these works 
focus on analog channel capacity sampled at or above the Nyquist rate, and do not account for the effect 
upon capacity of reduced-rate sampling. 

The Nyquist rate is the sampling rate required for perfect reconstruction of bandlimited analog signals 



or, more generally, the class of signals lying in shift-invariant subspaces ]T8] |, ]T9| . Various sampling 
methods at this rate for bandlimited functions were reviewed by Jerri [ [20| , including both uniform and 
nonuniform pointwise sampling techniques. Examples include recurrent non-uniform sampling proposed 
by Yen \2ll , which samples the signal in such a way that all sample points are divided into blocks where 
each block contains N points and has a recurrent period. Another example is filter-bank sampling first 



analyzed by Papoulis p2| , in which the input signal is sampled through M linear systems. For perfect 
reconstruction, these methods require sampling at an aggregate rate equal to or above the Nyquist rate. 

In practice, however, the Nyquist rate may be excessive for perfect reconstruction of signals that possess 
certain structure. For example, consider multiband signals, whose spectral content resides continuously 
within several subbands over a wide spectrum, as might occur in a cognitive radio system [ [23| , [ [24| . If 
the spectral support is known a priori, then the sampling rate requirement for perfect recovery is the sum 
of the subband bandwidths (including both positive and negative frequencies), termed the Landau rate 



p5| . One type of sampling mechanism that can reconstruct multiband signals sampled at the Landau 
rate is a filter bank followed by sampling, or "generalized" sampling, studied in [ [26| , [ [27| . The basic 
paradigm is to apply a bank of prefilters to the analog signal, each followed by a uniform sampler. A 
bank of filters followed by sampling is an effective class of non-uniform sampling methods that is widely 
applied in theory and practice pT| . 

When the channel or signal structure is unknown, other sampling methods have been investigated 
to see under what conditions a signal can be recovered from sub-Nyquist samples. Inspired by recent 
"compressive sensing" ideas [ |28| , [ [29| , sub-Nyquist sampling approaches have been developed to exploit 
the structure of various classes of input signals, such as multiband signals (30), [[3T]] and signals with finite 



rate of innovation p2|, |33 1. Modulation and filter banks followed by sampling, where the signal is passed 
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through modulation banks and filter banks before sampling, has proven to be very effective for signal 
reconstruction at sub-Nyquist sampling rates. By scrambling spectral contents from different subbands 
through the modulation operation, this method performs well in subsampling sparse multiband signals 
with unknown spectral support. One example of this sampling mechanism is the modulated wideband 
converter (MWC) proposed by Mishali et al. pO| , p4| , where all post-modulation filters are chosen to 
be low-pass filters. In fact, modulation and filter banks followed by sampling represents the most general 
class of realizable nonuniform sampling techniques, although it does not include certain techniques such 
as sampling at random sample times. 



Most of the above sampling theoretic work aims at finding optimal sampling and reconstruction 
mechanisms that achieve perfect reconstruction of a class of analog signals from noiseless samples. 
There has also been work on minimum reconstruction error from noisy samples based on certain 



statistical measures (e.g. MSE p5| , p6|). Another line of work pioneered by Berger et. al. p7|-[[4T| 
investigated joint optimization of the transmitted pulse shape and receiver prefiltering in PAM over an 
analog communication channel under sub-Nyquist sampling. In this work the optimal receiver prefilter 
that minimizes the MSE between the original signal and the signal reconstructed from the samples is 
shown to prevent aliasing. However, this work does not consider optimal sampling techniques based on 
the information-theoretic metric of channel capacity achievable through noisy samples of the channel 
output. In addition, these optimal filters derived in [ [37] |, p9t are used to determine an SNR metric 
which in turn is used to compute an approximation to sampled channel capacity based on the formula 
for capacity of the bandlimited AWGN channels. This approximation does not correspond to the precise 
capacity of undersampled bandlimited AWGN channels we derive herein, nor is the capacity of more 
general undersampled analog channels considered. Guo et. al. explored the connection between mutual 
information and MMSE for Gaussian channels [42], but focused on the discrete domain and therefore 
did not account for undersampling of analog signals. 



The tradeoff between capacity and hardware complexity has been studied in another line of work 



focused on sampling precision ||43|-[|45|. These works demonstrate that, due to quantization of samples, 
sampling above the Nyquist rate can be beneficial in increasing achievable data rates. The focus of this 
quantization analysis is on the effect of increasing the sampling rate beyond the Nyquist rate to combat 
quantization error, whereas this paper is concerned with determining capacity and optimal sub-Nyquist 
sampling strategies for channels based on the channel structure, without considering quantization errors. 
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B. Contribution 



Sampling structures typically rely on general prefiltering prior to sampling |18|, which can suppress 



aliasing and post-sampling noise, minimize the recovery error for certain classes of input signals, and 



account for non-ideal linear distortion features of practical acquisition devices [ [46| , [ [47| . Here, we explore 
sampled analog channels with the following three classes of sampling mechanisms: (1) a filter followed 
by sampling: the analog channel output is prefiltered by a single linear filter followed by an ideal uniform 
sampler (see Fig. [2]); (2) sampling following filter banks: the analog channel output is passed through 
a bank of LTI filters, each followed by an ideal uniform sampler (see Fig. |3]); (3) modulation and filter 
banks followed by sampling: the channel output is split into M branches, where each branch is prefiltered 
by an LTI filter , modulated by a different modulation sequences, passed through another LTI filter and 
then sampled uniformly. Our main contributions are summarized as follows. 

• Filtering followed by sampling. We derive the capacity for sampled analog channels with this 
sampling mechanism in the presence of both white noise and colored noise. Due to aliasing, the 
sampled channel can be represented as a MISO Gaussian channel in the spectral domain, while the 
optimal input effectively performs maximum ratio combining. The optimal prefilter is derived and 
shown to extract out the frequency with the highest SNR while suppressing signals from all other 
frequencies. This prefilter also minimizes the MSE between the original signal and the reconstructed 
signal, illuminating a connection between capacity and MMSE. 

• Filter banks followed by sampling. A closed-form expression for sampled channel capacity is 
derived, along with analysis that relates it to a MIMO Gaussian channel. The input should be chosen 
to decouple the dimensions of the equivalent MIMO channel. We also derive optimal filter banks 
that maximize capacity. The M filters select the M frequencies with the M highest SNRs and zero 
out signals from all other frequencies. This strategy is also shown to minimize the MSE between the 
original and reconstructed signals. This mechanism often achieves larger sampled channel capacity 
than a single filter followed by sampling if the channel is non-mono tonic, and it achieves the analog 
capacity of multiband channels sampled at the Landau rate if the number of branches is appropriately 
chosen. 

• Modulation and filter banks followed by sampling. For modulation sequences that are periodic 
with period Tq = we derive the sampled channel capacity and show its connection to a more 
general MIMO Gaussian channel in the frequency domain than in the case without modulation 
banks. For sampling following a single branch of modulation and filtering, we provide an algorithm 
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to identify the optimal modulation sequence for piece-wise flat channels when Tq is an integer 
multiple of the sampling period. This single-branch mechanism achieves the same performance as 
employing an optimal filter bank with each branch sampled at a period Tq. 
One interesting fact we discover for all these techniques is the non-monotonicity of capacity with sampling 
rate, which indicates that more sophisticated sampling techniques are needed to maximize achievable data 
rates under sub-Nyquist sampling. 



C. Organization 

The remainder of this paper is organized as follows. In Section |ll| we describe the problem formulation 
of sampled analog channels for the sampling mechanisms described above. We then state our formal 
capacity results for each of these sampling mechanisms: sampling with a filter, with a filter bank, and 
with modulation and filter banks, along with their implications in Sections [lIl] - [V| In particular, in each 
section the main theorems are analyzed and interpreted based on Fourier analysis and classical MIMO 
channel results, with numerical examples provided to illustrate the loss in capacity due to reduced-rate 
sampling. Optimal sampling structures that maximize capacity under reduced-rate sampling are derived 
under both a filter followed by sampling and sampling following filter banks. These optimal structures 



are shown to minimize the MSE between the input and a linearly reconstructed signal in Section VI 
Proofs of the main theorems (Theorems |2]-Q are provided in the appendices. Our notation is summarized 
in Tabled! 

II. Preliminaries: Capacity of undersampled channels 
A. Capacity Definition 

We consider the same waveform channel model of Gallager |[l] Chapter 8]. The transmit signal x{t) 
is time constrained to the interval (0,r]. The channel is modeled as an LTI filter with impulse response 

/oo 
h{t) exp{—j27Tft)dt. The analog channel output is given as 
— oo 

r(t) = /i(t)*x(t) + 7/(t), (1) 

and is observed over (0, where ri{t) is stationary zero-mean Gaussian noise, as illustrated in Fig. 
[TJa). We assume throughout the remainder of the paper that perfect channel state information, i.e. 



^We impose the assumption that both the transmit signal and the observed signal are constrained to finite time intervals to 
allow for a rigorous definition of channel capacity. In particular, as per Gallager's analysis 1 1 Chapter 8], we first calculate the 
capacity for finite time intervals and then take the limit of the interval to infinity. 
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Table I 

Summary of Notation and Parameters 



L\ set of measurable functions / such that / |/| d/x < cx) 

S+ set of positive semidefinite matrices 

h{t),H(f) impulse response, and frequency response of the analog channel 

Si{t), Si{f) impulse response, and frequency response of the ith prefilter 

Sr^{f), Sx(f) spectral density of the noise 77 (t) and the stationary input signal x{t) 

M number of prefilters 

/s, Ts aggregate sampling rate, and the corresponding sampling interval (Ts = l//s) 

Pi(t) modulating sequence in the ith channel 

Tp period of the modulating sequence pi (t) 

II "lip, II -112 Frobenius norm, £2 norm 

transpose of vector v 



Boldface used for vectors and matrices. 



perfect knowledge of h{t), is known at both the transmitter and the receiver. The traditional Shannon 
framework investigates the maximum mutual information between x{t) and r{t) under a power constraint. 
Specifically, the analog channel capacity is defined as |[T} Section 8.1] 

C= lim lsup/(x(0,r]; r(0,r]), 

where the supremum is over all input distributions subject to an average power constraint 
/o^ \^{^)\^ ^'^) ^ we explicitly indicate the interval (0, T] over which the signal is transmitted 

and observed. For completeness, we repeat the classical analog capacity result from Gallager as follows. 

Theorem 1. [1 Theorem 8.5.1] Consider an analog channel with a power constraint P and noise 
power spectral density Srj(f). Assume that \H(f)f /Srj{f) is bounded and integrable, and that either 
J^oo ^^(/)^/ < 00 6>r that Srj{f) is white (a constant). Then the capacity of the analog channel is given 
parametrically by 



where J-'{iy) and v satisfy 



H^) = {f-- < ^ ^ (3) 



/ 



V — 



\H{fW 
\H{f)\'} 



df = P. (4) 
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The channel is frequency- selective where the SNR at each frequency / in the spectral domain 
is captured by |i7 (/)|^ The optimal transmission strategy is to perform water- filling power 
allocation over frequency. For a channel of bandwidth B over positive frequencies, if we remove the 
noise outside the channel bandwidth via prefiltering and sample the output at a rate / > 2B, then we 
can perfectly recover all spectral contents of the received signal (including transmitted signal and noise) 
within the channel bandwidth, which allows capacity (|2]) to be achieved without loss of data rate due 
to sampling. For this reason, we will use the terminology Nyquist-rate channel capacity for the analog 
channel capacity ([2]), which is commensurate with sampling at or above the Nyquist rate of the received 
signal after optimized prefiltering. 

Under sub-Nyquist sampling, the capacity typically depends on the sampling mechanism and its 
sampling rate. Specifically, the channel output r{t) is now passed through the receiver's analog front 
end, which may include a filter, a bank of M filters, or a bank of preprocessors consisting of filters and 
modulation modules, yielding a collection of analog outputs {yi{t) : 1 < i < M}. We assume that the 
analog outputs are observed over the time interval (0, T] and then passed through ideal uniform samplers, 
yielding a set of digital sequences {yi[n] : n e ^ < i < M}, as illustrated in Fig. [ijb). Here, each 
branch is uniformly sampled at a sampling rate of fs/M samples per second. 

Defining the sampled sequence as y[n] = • • • , ^mM], the problem of finding the capacity C{fs) 

of sampled analog channels can be posed as quantifying the maximum mutual information between the 
input signal x{t) on the interval (0,T] and the output sequence sampled at an aggregate rate fs on the 
interval (0, T] in the limit as T ^ cxo. The sampled channel capacity can thus be expressed as 

C(/,) = ^lim^supl (x(0,T]; {yN}(o_^]) , 

where the supremum is taken over all possible input distributions subject to an average power constraint 
^ (t Io — explicitly indicate the interval (0,T] over which the samples are 

taken. 

The sequence of discrete-time samples depends on the sampling rate and the sampling mechanism 
we employ, which impacts the information conveyed through this sampled sequence. In our analysis we 
focus on developing a general analytic framework for sampled channel capacity that accommodates a 
large class of sampling mechanisms, going beyond ideal uniform Nyquist-rate sampling. For ease of 
exposition, we organize this paper in incremental steps associated with increasingly complex sampling 
strategies, where the first steps lay the analytical foundation for the later steps. In particular, starting 
from sampling following a single filter, we extend our results to incorporate filter banks and modulation 
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— ► 
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Figure 1. (a) Original analog channel: the input x(t), constrained to (0, T], is passed through the channel to yield the received 
signal r(t) without preprocessing and sampling; (b) sampled analog channel: The input x(t), constrained to (0,T], is passed 
through M branches of the receiver analog front end to yield analog outputs {yi{t) : 1 < i < M}; each analog output yi{t) is 
observed over (0, T] and uniformly sampled by a sampler at a rate fs/M — {MTs)~'^ samples per second to yield the sampled 
sequence yi [n] . The preprocessor can be a filter, or a filter and a modulator followed by another filter. 



banks, which is the most general class of realizable non-uniform sampling strategies. 

B. Sampling Mechanisms 

In this subsection, we formally describe the three classes of sampling strategies we investigate. 
1) Filtering followed by samplings: Ideal uniform sampling is performed by sampling the analog signal 
uniformly at a rate fg = T~^. In order to avoid aliasing, suppress out-of-band noise and compensate for 



distortion, a prefilter is often added prior to the ideal uniform sampler [ |T8| . Adding a prefilter can also 
be used to model the linear distortion features of practical sampling devices. Our sampling process thus 
includes a general analog prefilter, as illustrated in Fig. [2j Specifically, before sampling, we prefilter the 
received signal with an LTI filter that has impulse response s{t) and frequency response S (/), where we 
assume that h{t) and s{t) are both bounded and continuous. The filtered output is observed over (0,T] 
and can be written as 

y{t) = s{t) * {h{t) * x{t) + r]{t)) , t G (0, T] . (5) 



10 



We then sample y{t) using an ideal uniform sampler, leading to the sampled sequence 

y[n] = y{nTs), 

where Tg denotes the sampling interval. The metric of interest is then the maximum mutual information 
between x(0,T) and {y[^]}(or]- 



x{t) 



h(t) 



rj{t) 



t = nT 



r(0 



5(0 



^ ^ y[n] 



y(t) 



Figure 2. Filtering followed by sampling: the analog channel output r(t) is linearly filtered prior to ideal uniform sampling. 



2) Sampling following Filter Banks: Sampling following a single filter often falls short of exploiting 
channel structure. In particular, although Nyquist-rate uniform sampling preserves information for 
bandlimited signals, for multiband signals it does not ensure perfect reconstruction at a rate approaching 
the Landau rate (i.e. the total widths of spectral support). That is because uniform sampling at sub- 
Nyquist rate may suppress information by collapsing subbands, resulting in fewer degrees of freedom. 
This motivates us to investigate certain nonuniform sampling mechanisms. In particular, we now consider 
the class of non-uniform sampling mechanisms that is most widely used in practice, where the received 
signal is preprocessed by a bank of filters. Most nonuniform sampling techniques that have been studied 



in theory and applied in practice |22|, |26|, p7| fall under filter-bank sampling and modulation-bank 



sampling (as described in |II-B3| ). Note that the filters may introduce any given delay, so this approach 
subsumes that of a filter bank with different sampling times at each branch. 

In this sampling strategy, we replace the single prefilter in Fig.[2]by a bank of M analog filters followed 
by ideal sampling at rate fs/M for each branch, as illustrated in Fig. [3] We denote by Si{t) and Si (/) the 
impulse response and frequency response of the ith linear filter, respectively. The filtered analog output 
in the ith branch prior to sampling is then given as 



y,{t) = {h{t) * s,{t)) * x{t) + s,{t) * 7y(t), t G (0, T] . 
These filtered signals are then passed through M ideal samplers to yield 

VM = yi{nMTs) and y[n] = [yi[n],y2[n], • • • ,yM[n]] , 



(6) 



(7) 



where Tg = fg^. The capacity now is the maximum mutual information between x(0,T] and y[n]. 
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Figure 3. A filter bank followed by sampling: the received analog signal r(t) is passed through M branches. In the ith branch, 
the signal r(t) is passed through an LTI prefilter with frequency response Si{f), and then sampled uniformly by an ideal uniform 
sampler. 



3) Modulation and Filter Banks Followed by Sampling: We generalize a filter bank followed by 
sampling by adding an additional filter bank and a modulation bank, which includes as special cases a 
broad class of nonuniform sampling methods that are applied in both theory and practice. Specifically, the 
sampling system with sampling rate fs comprises M different branches. In the zth branch, the received 
analog signal r{t) is prefiltered by an LTI filter with impulse response pi{t) and frequency response 
Pi{f), modulated by a periodic waveform qi{t) of period Tq, filtered by another LTI filter with impulse 
response Si{t) and frequency response Si{f), and then sampled uniformly at a rate fs/M = (MT^)"^, 
as illustrated in Fig. [4j The first prefilter Pi{f) will be useful in removing out-of-band noise, while the 
periodic waveforms scramble spectral contents from different aliased sets, thus bringing in more design 
flexibility that may potentially lead to better exploitation of channel structures. By taking advantage of 
random modulation sequences to achieve incoherence among different branches, this sampling mechanism 
has proven useful for sub-sampling analog multiband signals by exploiting spectral sparsity [ |30| . Note 
that as with previous methods, the filters can introduce arbitrary delay, so that the branches may be 
sampled at different times. 

In the zth branch, the received prefiltered analog signal in the time interval (0,T] prior to sampling 
can be written as 

y^(t) = s,{t) * {q,{t) • {p,{t) * h{t) * x{t)+p,{t) * r]{t))) , (8) 
resulting in the digital sequence of samples 

y^[n] = y^{nMTs) and y [n] = [yi [n] , • • • , vm [n]]^ . (9) 
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Figure 4. Modulation and filter banks followed by sampling: in each branch, the received signal is prefiltered by an LTI filter 
with impulse response Pi(t), modulated by a periodic waveform qi{t), filtered by another LTI filter with impulse response Si{t), 
and then sampled at a rate fs/M. 



C. Sampling Preliminaries 

Before proceeding, we recall some preliminaries from sampling theory. Suppose that a sampled 
sequence x[n] of a continuous-time signal x{t) is obtained by sampling x{t) at a rate fs, i.e. 
x[n] = x{nTs). The discrete time Fourier transform of x[n] is given by 

^ oo 

- J2 ^if-^fs)^ (10) 

^ n=—oo 

where X{f) = x{t) exp{—j27Tft)dt is the continuous-time Fourier transform of x{t). In other words, 
the uniformly sampled signal depends on the periodic extension of the original signal in the frequency 
domain. 

For convenience of exposition, we introduce the following notation of an infinite-dimensional vector, 
which is the sampled vector associated with X{f): 

Yxif, fs) = ^[---, X{f - fs),X{f),X{f + fs),---f (11) 



such that the Zth (l G Z) coordinate of Vx{f,fs) is X{f + Ifs). If x{t) is uniformly sampled with 
sampling rate /«, then the discrete time Fourier transform of x[n] can be written as 

1 



^xifJs)h (12) 



where 1 is an infinite vector whose entries are all equal to 1. We will see later that the £2 norm of the 

sampled vector plays a key role in the capacity of the sampled analog channel. This norm is given by 

1 / 00 \ I 



Kl=- 
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which characterizes the folded signal magnitude at frequency /. For notational convenience, we use 
VxY+zif, fs) = -^[---, X{f - f,)Y{f - /,) + Z{f - f,),X{f)Y{f) + Z{f), --.f (14) 



^ s 

to denote the sampled vector associated with X{f)Y{f) + Z{f). In particular, if Z{f) = for all /, 
then we have the notion 

Vxy (/, /.) = ^ [• • • , X{f - fs)Y{f - fs),X{f)Y{f), ...f. (15) 



D. Exposition Outline 



The main results for the three sampling strategies introduced in Section |II-B| are formally stated and 
analyzed in the following sections. For each scenario, we first provide an approximate treatment based on 
Fourier analysis by relating the sampled channel to a traditional MISO or MIMO Gaussian channel. This 
approach, while not strictly rigorous, allows for a more intuitive and informative understanding of our 
results from a communication theoretic perspective. We also provide complementary interpretations of 
these results along with several numerical examples. In particular, we analyze how to optimize capacity for 
both filtering followed by sampling and sampling following filter banks, and interpret these optimization 
results from both information theoretic and sampling theoretic viewpoints. The rigorous analyses of the 
main theorems are deferred to the appendices, which make heavy use of asymptotic spectral properties of 
Toeplitz and block- Toeplitz matrices. This method of exposition, with an approximate analysis based on 
Fourier analysis followed by a rigorous treatment using Toeplitz properties, is similar to the exposition 
of waveform channel capacity used by Gallager in (T} Chapter 8]. 

III. A Filter Followed by Sampling 

A. Main Results 

Applying a prefilter generates a new equivalent channel with channel gain H (/) S (/). The noise is 
also prefiltered and is therefore non-white in general. The ideal uniform sampler that follows the prefilter 
creates an aliased version of the prefiltered signal in the frequency domain, as reflected in the following 
capacity expression. 

Theorem 2. Consider the system shown Fig. [2] where r](t) is Gaussian noise with power spectral density 
Srj{f). Assume that h{t), s{t) are both continuous, bounded and absolutely Riemann integrable, and that 
there exists some constant Cg such that 

2 ^ 

^Sy^Sf^f^)\l- E \Sif-lfs)fSr,{f-lfs)>e,>0 (16) 
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holds. Additionally, suppose that hj^{t) \— ^ [ ^^^^ ) satisfies hj^{t) — o{t ^) for some constant 
6 > l[^ The capacity C{fs) of the sampled channel with a power constraint P is then given parametrically 



as 



1 

2 



/ E \H{f-lfs)S{f-lfs)\^\ 



log 



l=—oo 



E \s{f-ifsWs,{f-ifs 

l=—oo 



df 



U 



log 



\\^HsU,fs)\\l 



df 



(17) 



(18) 



where 



\yHsifjsm 



< V and f G 



fs fs 

' 2"' y 



(19) 



and V satisfies 



I 



V — 



sJs, 



ifjs 



wvHsifjsm 



df = P. 



(20) 



Remark 1. The assumption {16) ensures that 



V 



is bounded away from zero ensures that the 



filter response satisfies s{t) ^ for all t. 



As expected, applying the prefilter modifies the channel gain and colors the noise accordingly. The 



2 ' 2 



color of the noise is reflected in the denominator term of the corresponding SNR in (17) at each / G 
within the sampling bandwidth. The linear time invariance of both the channel and the prefilter 
response leads to an equivalent frequency- selective channel, and the ideal uniform sampling that follows 
generates a folded version of the non-sampled channel capacity. Specifically, this capacity expression 
differs from the analog capacity given in Theorem [T] in that the SNR in the sampled scenario is 7s(/) := 

2 

||Vif^(/, fs)\\l I V^^(/, fs) ^ in contrast to 7o(/) := /5r;(/) for the non-sampled scenario. 

Water filling over the inverse sampled SNR %^{f) is the optimal power allocations. 



This condition is used in Appendix [Aj as a sufficient condition to guarantee asymptotic properties of Toeplitz matrices. A 



similar condition will be used in Theorems 



[3]and|4] 
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B. Approximate Analysis 

Rather than providing here a rigorous proof of Theorem |2| we first develop an approximate analysis 
by relating the aliased channel to MISO channels, which allows for a communication interpretation as in 
J!} Chapter 8.3]. The rigorous analysis, which is deferred to Appendix [A} makes use of a discretization 
argument and asymptotic spectral properties of Toeplitz matrices. 

Consider first the equivalence between the sampled channel and a MISO channel at a single frequency 
/ G fs/'A' Suppose the transmitted signal has a frequency respons€[^X(/). The Fourier transform 

of the sampled signal is given by 

+ 00 r p, p - 

(21) 



^ J2 Hif-kfs)Sif-kfs)Xif-kfs), V/ 



fs fs 



due to aliasing. The summing operation allows us to treat the aliased channel at each frequency / within 
the sampling bandwidth as a separate MISO channel with countably many input branches and a single 
output branch, as illustrated in Fig. [5ja). 

By assumption, the noise is of spectral density 5r;(/), and hence the prefiltered noise has power spectral 

is then 



density 5^(/)|S'(/)p. The power spectral density of the sampled noise sequence at / G 

2 



fs fs 

" 2 ' 2 



given by 



V 



{fJs 



ES-oo Sr^U-lfs) \S{f - lfs)\- If we term {/-//.: / G Z} the aliased 
frequency set for /, then the amount of power allocated to X{f — Ifg) should "match" the corresponding 
channel gain within each aliased set in order to achieve capacity. It follows from known results [ |48| that 
the MISO channel effectively has only one degree of freedom, and that the capacity-achieving strategy 
for a MISO Gaussian channel, which is often referred to as transmit maximum ratio combining (MRC) or 
beamforming, exploits the transmit diversity to maximize the received SNR. Specifically, denote by G{f) 



the transmitted signal for each / G 
and sent through the Ith input branch, i.e. 



L L 
2 ' 2 



. This signal is multiplied by a constant gain cai{l G Z), 



X{f-lfs) = caiG{f), V/gZ, (22) 
where ai = ^ \\vl!s{f f)^^'^'''^ c is a normalizing constant determined by the power constraint. The 



resulting SNR can be expressed as the sum of SNRs (as shown in [ [48| |) at each branch 

c'\\VHs{fJs)\\l 



(23) 



^This is an approximate analysis since the Fourier transform of the input signal may not even exist. The proof we provide 
later does not use Fourier analysis but rather the convergence properties of Toeplitz operators. 
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Since the sampling operation combines signal components at frequencies from each aliased set 
{f — lfs-l^ it is equivalent to having a set of parallel MISO channels, each indexed by some 



/ G — ^5 2 • Sii^^^ ^^^h MISO channel has one degree of freedom, it can be converted to a set of 
parallel SISO channels, where the channel at / has an equivalent channel gain H{f) = \\^Hs{f^ /s)ll2' 
as illustrated in Fig. |5jb). The water-filling strategy is optimal in allocating power among the set of 



parallel channels, which yields the parametric equations ([19]) and ( [201 ) completes our approximate 
analysis. 



Af-¥.) 



H(f) . Sif) 
G{f) f X{f) 

ca.,\ H{f + H) ' S(f + kf,) 

I • ; 

(a) 



noise 

;PSD:5,(/)v/5(7) ) 



G(/.)^® ' Q) 'Y(f,) 



G(/2)— >® ' (I) 'Y{f,) 



I 



2 2l 



(b) 



Figure 5. Equivalent representations for filtering followed by sampling: (a) Equivalent MISO Gaussian channel for a given 
/ G [— /s/2, /s/2]; (b) The equivalent set of parallel SISO channels representing all / G [— /s/2, /s/2], where the SISO channel 
at a given frequency is equivalent to the MISO channel in Fig. [5ja). 



C. Proof Sketch 

Since the Fourier transform is not well-defined for signals with infinite energy, there exist technical 
flaws lurking in the approximate treatment of the previous subsection. The key step to circumvent these 
issues is to explore the asymptotic properties of Toeplitz matrices/operators. This approach was used 
by Gallager |1 1 to prove the analog channel capacity theorem. Under uniform sampling, however, the 
sampled channel no longer acts as a Toeplitz operator, but instead becomes a block-Toeplitz operator. 
Since Gallager' s approach [ 1 , Chapter 8.4] does not accommodate block-Toeplitz matrices, a new analysis 
framework is needed. We provide here a roadmap of our analysis framework, and defer the complete 
proof to Appendix |A| 
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1) Discrete Approximation: The channel response and the filter response are both assumed to be 
continuous, which motivates us to use a discrete-time approximation in order to transform the continuous- 
time operator into its discrete counterpart. We discretize a process in the time domain by point-wise 
sampling via an interval A, e.g. h{t) is transformed into by setting 

h[n] = /i(nA). 

For any given T, this allows us to use a finite-dimensional matrix to approximate the continuous-time 
block- Toeplitz operator. Thanks to the continuity assumption, an exact capacity expression can be obtained 
by letting A go to zero. 

2) Spectral properties of block-Toeplitz matrices: After discretization, the input-output relation is 
similar to a MIMO discrete system. Applying MIMO channel capacity results leads to the capacity 
for a given T and A. The channel capacity is then obtained by taking T to infinity and A to zero, 
which can be related to the channel matrix's spectrum using Toeplitz theory. Since the prefiltered noise 
is non-white and correlated across time, we need to whiten it first. This, however, destroys the Toeplitz 
properties of the original system matrix. In order to apply established results in Toeplitz theory, we 
introduce the concept of asymptotic equivalence that builds connections between Toeplitz matrices and 
non-Toeplitz matrices. This allows us to relate the capacity limit with spectral properties of the channel 
and filter response. 

D. Optimal Prefilters 

1) Derivation of optimal prefilters: As we can see from Theorem |2} different prefilters lead to different 
channel capacities. A natural question then is how to choose S{f) to maximize capacity under filtering 
followed by sampling. The optimizing prefilter is given in the following corollary. 

Corollary 1. Consider the system shown in Fig. [2] Suppose that in each aliased set{f — lfs'.l^7j}, 
there exists k such that — sup^^^ ^-^^jjz{f^- Then the capacity in {17) is maximized by the 

filter with frequency response 



S{f - kfs 



1 jf \H{f-kf.)\' _ ^^^^ \H{f-lfs)\' 



0, otherwise, 
for any f G [—fs/2, fs/'^] and any /c G Z. 



Proof: It can be observed from ( 17 ) that the frequency response S{f) at any / can only affect the SNR 



at / mod fs, indicating that we can optimize for frequencies fi and /2 ( /i 7^ /2; /i, /2 ^ 



fs fs 

" 2 ' 2 
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separately. Specifically, the SNR at each / G 



^5 in the aliased channel is given by 



|2 



SNR(/) := ^^^"su,js„2^ ^ y: ',vr ; (^5) 

/=-oo ^vU-lJs) 

2 



where 

Note that < A; < 1 and J2i = 1- Thus, SNR(/) is a convex combination of | ^s^(fl{fj , ^ e z}, 
which is upper bounded by 

SNR(/)^ax = sup . (26) 



This bound can be attained by the filter given in ( |24| ). In other words, the optimal prefilter puts 
all its mass in those frequency components with the highest SNR within each aliased frequency set 
{f-lf,:le Z}. ■ 

2) Interpretations: Recall that S{f) is assumed to be right-invertible and is applied after the noise is 
added. In the analog channel, the prefilter would become useless in terms of capacity benefits since we 
are always able to recover the non- filtered signal by applying an inverse filter on y{t), i.e. by the data 
processing inequality no addition information can be obtained by filtering the received signal. However, 
in the aliased channel, the prefiltering operation is non-invertible. As we show above, the aliased SNR is 
a convex combination of SNRs at all aliased branches, indicating that S{f) plays the role of ''weighting'' 
different branches. As in MRC, those frequencies with larger SNR should be given higher weight, while 
those that suffer from poor channel gain should be suppressed. 

The problem of finding optimal prefilters can indeed be posed as a joint optimization over all input 
and filter responses. Looking at the equivalent aliased channel for a given frequency / G [— /s/2, /s/2] 
as illustrated in Fig. [5ja), we have full control over both X{f) and S{f). The channel associated with 
the frequency / differs from a standard MISO channel model in that the prefiltering S{f) allows us to 
weight the branch input gain to the combiner. Thus it is equivalent to applying transmit beamforming (i.e. 
transmit branch weighting) and receiver shaping (i.e. multiplication by S{f)) followed by combining in a 
MIMO Gaussian channel. A related joint optimization problem over both prefiltering and postfiltering has 



been investigated in [|40||, |41 1, but their approach is based on the MSB measure instead of an information 



theoretic metric. We will discuss the relation with MSE optimization later in Section VI 

Although MRC at the transmitter side maximizes the combiner SNR for a MISO channel [ |48| , it 
turns out to be suboptimal for our joint optimization problem. The optimal solution is to perform 
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selection combining [ [48| by setting S{f — Ifs) to one for some / = /q, as well as noise suppression 
by setting S{f — Ifs) to zero for all other Is. The prefilter outputs the signal on the branch with the 
highest SNR, and suppresses noise from all remaining branches. The resulting combiner SNR becomes 
max^ SNR^ times the number of branches, which exceeds the SNR achieved by MRC SNR^). Here, 
SNR/ denotes the channel gain 

\H{f -Ifs)? divided by the noise. Setting S{f) to zero precludes the 
undesired effects of noise from low SNR frequencies, which is crucial in maximizing data rate. 

Another interesting observation is that optimal prefiltering equivalently generates an aliased-free 
channel. After passing through an optimal prefilter, all frequencies modulo fs except the one with 
the highest SNR are removed. Unless there exist multiple branches that possess the highest SNR, 
the optimal filtering followed by sampling indeed suppresses aliasing as well as noise. This alias- 
suppressing phenomena, while different from many sub-Nyquist works that advocate mixing instead 
of alias suppressing [ [28| , pO| , arises from the fact that we have control over the input shape. When 
the input is given, the prefilter that maximizes the mutual information is the MRC type filter which 
indeed mixes different frequency components from each aliased set. But a joint optimization over both 
the input and the prefilter yields an input whose frequency support is equal to the sampling bandwidth, 
thus resulting in an alias-suppressing filter in order to remove noise. 

E. Numerical examples 

1 ) Additive Gaussian Noise Channel without Prefiltering: The first numerical example we consider is 
an additive Gaussian noise channel. The channel gain is flat within the channel bandwidth B (here, we 
set B = 0.5), i.e. H{f) = 1 if / g [— B] and H{f) = otherwise. The noise process is modeled as 
a measurable and stationary Gaussian process with the power spectral density plotted in Fig. [6j In fact, 
this is the noise model adopted by Lapidoth in [[49| to approximate white noise, which avoids the infinite 
variance of the standard model for unfiltered white noise|^ In this example, we employ ideal point-wise 
sampling without filtering. 

Since the noise bandwidth is larger than the channel bandwidth, ideal uniform sampling without 
prefiltering does not allow analog capacity to be achieved when sampling at a rate equal to twice the 

"^In fact, the white noise process only exists as a "generalized process" in stochastic calculus, and ideal uniform sampling 
operating on white noise without prefiltering brings in noise from high-frequency components, which results in a folded noise 
process with infinite spectral density. In order to avoid this mathematical difficulty, we consider in this example Lapidoth's noise 
model. 
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Figure 6. Additive Gaussian noise channel. The channel gain and the power spectral density of the noise is plotted in the left 
plot. The sampling mechanism employed here is ideal uniform sampling without filtering. The power constraint is P = 5. The 
sampled capacity, as illustrated in the right plot, does not achieve analog capacity when sampling at a rate equal to twice the 
channel bandwidth, but does achieve it when sampling at a rate equal to twice the noise bandwidth. 



channel bandwidth. This is because uniform sampUng without prefiltering brings in noise from high- 
frequency components outside the channel bandwidth. Increasing the sampling rate above twice the 
channel bandwidth (but below the noise bandwidth) spreads the total noise power over a larger sampling 
bandwidth, reducing the noise density at each frequency, which allows the sampled capacity to continue 
increasing at sampling rates above the Nyquist rates, as illustrated in Fig. |6j It can be seen that the 
capacity does not increase monotonically with the sampling rate, which is a consequence of the non- 
monotonicity of the SNR in fg, as described in more detail later. We also note that capacity does not 
increase further when the sampling rate exceeds twice the noise bandwidth, since oversampling at any 
rate above twice the noise bandwidth already preserves all contents of the channel output - no further 
information can be harvested. 

2) Optimally Prefiltered Channel: In general, the frequency response of the optimal prefilter is 
discontinuous, which may be hard to realize in practice. However, for certain classes of channel models, 
the prefilter has a smooth frequency response. One example of this channel class is a monotone channel, 
whose channel response obeys ^^^^^^^| > ^^^^^^^| for any /i > /2. Corollary |lj implies that the optimizing 
prefilter for a monotone channel reduces to a low-pass filter with cutoff frequency /s/2. As an example. 
Fig. [7] illustrates the capacity- sampling tradeoff curve for the raised-cosine channel, for different roll-off 
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factors. The frequency response of the channel is given by 



H{f) 



l/l<^ 



2 



1 + COS 



ttT 

/3 



I/I 



1-/3 
2T 



1-/3 
2T 



2T ' 
<l/l<^ 



(27) 



2' 



where /3 denotes the roll-off factor and T is a given period. It can be observed that below the Nyquist 
rate, capacity increases with fg since the effective sampling bandwidth increases, while oversampling 
beyond the Nyquist rate does not increase capacity. As expected, sampling at or above the Nyquist rate 
creates an alias-free capacity expression that can be simplified as 



C{fs)^l I log( 



\H{fW 



d/, 



(28) 



which equals the classical Nyquist-rate (i.e. the analog) channel capacity derived in [1 1. For non-monotone 
channels, the optimal prefilter may not be a low-pass filter, as illustrated in Fig. [8] We plot in Fig. [8jb) 
the optimal filter for the channel given in Fig. [8ja) when the sampling rate fg = 0.4/nyq- It can be seen 
that this filter is no longer a low-pass filter but is of support size 0.4/nyq- 




Figure 7. The sampled channel capacity vs sampling rate for a raised-cosine channel with an optimal prefilter. The channel 
bandwidth is assumed to be [— |, |], the power constraint P = 10, and the noise is white with flat spectral density cr^ = 1. 
The frequency response H{f) of the channel is assumed to be a raised cosine function with /3 = 0.9 and T = 1.6. The tradeoff 
curves for two types of prefilters are illustrated: (1) the optimal prefilter (which is a low-pass filter); (2) the matched filter 
whose frequency response obeys S{f) = H*{f). In the sub-Nyquist sampling rate regime, the optimal prefilter outperforms the 
matched filter, while the two curves coincide when sampling is performed above the Nyquist rate. 



3) Capacity Non-monotonicity: When the channel is not monotone, a somewhat counter-intuitive fact 
arises: the channel capacity C{fs) is not necessarily a non-decreasing function of the sampling rate fg. 
Examples include specific multiband channels as illustrated in Fig. |9] Here, the Fourier transform of the 
channel response is concentrated in two sub-intervals. Specifically, assuming that the entire bandwidth is 
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(a) 




(c) 



(d) 



Figure 8. Capacity of optimally prefiltered channel: (a) frequency response of the original channel; (b) optimal prefilter 
associated with this channel for sampling rate 0.4; (c) optimally prefiltered channel response with sampling rate 0.4; (d) capacity 
vs sampling rate for the optimal prefilter and for the matched filter. The optimal prefilter has support size fs in the frequency 
domain, hence its output is alias-free. In the sub-Nyquist regime, this alias-suppressing filter outperforms the matched filter in 
the resulting capacity. 



contained in [— ^, ^] with Nyquist rate /nyq = 1, the channel is given by 



H{f) = 



1, if l/|e[^,i]U[i,^]; 

0, otherwise. 



(29) 



If the channel is sampled at a rate fs = |/nyq, then aliasing occurs and leads to an aliased channel with 
one subband (and hence one degree of freedom). However, if sampling is performed at a rate fs = |/nyq, 
it can be easily verified that the two subbands remain non-overlapping in the aliased channel, resulting 
in two degrees of freedom. The tradeoff curve between capacity and sampling rate with an optimal 
prefilter is plotted in Fig. |9] This curve indicates that increasing the sampling rate may not necessarily 
increase capacity for certain channel structures. In other words, a single filter followed by sampling 
largely constrains our ability to exploit channel and signal structures. 
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Multiband channel Channel capacity vs Sampling rate (Multiband channel) 
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Figure 9. The sampled channel capacity vs sampling rate for a single filter followed by sampling and for a filter bank followed 
by sampling for a bank of two filters and of four filters. The channel bandwidth is [— |, |], the power constraint P = 10, and 
the noise power cr^ = 1. For a single filter followed by sampling and for a bank of two filters followed by sampling, the capacity 
is not monotonically increasing in the sub-Nyquist regime, while sampling above the Nyquist rate does not increase capacity. 
A filter bank with 4 filters followed by sampling achieves analog capacity when sampling at or above the Landau rate, which 
outperforms single-filter sampling. In general, when sampling is done below the Landau rate, capacity does not monotonically 
increase with sampling rate for either a single filter or a filter bank followed by sampling. 



IV. A Bank of Filters Followed by Sampling 

A. Main Results 

We now treat filter-bank sampling, in which the channel output is filtered and sampled through multiple 
branches. Since the sampled output at these branches are all functions of the same input and noise 
and hence mutually dependent, the optimal transmission scheme must account for their correlation. 
Specifically, the transmit signal should be chosen to decouple mutual interference across different 
branches. This is reflected in the capacity expression given in Theorem |3] 

In order to state our theorem formally, we introduce two Fourier symbol matrices Fg and F/^. Here, 
Fs is an infinite matrix of m rows and infinite columns and is a diagonal infinite matrix such that 



Vl<i<fe,VlGZ: < * (30) 

Theorem 3. Consider the system shown in Fig. ^ Assume that h{t) and Si{t) {1 < i < M) are all con- 
tinuous, bounded and absolutely Riemann integrable. Additionally, assume that hr.{t) := J^~^ i ^^^^ ) 
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satisfies hj^{t) — o{t ^) for some constant e > 1, and that is right-invertible for every /. The capacity 
C{fs) of the sampled channel with a power constraint P can be given as 



Cifs 



1^ ^ M 

2M 1 



2M i=l 



f.F;,f;;f; 



d/, 



where 



- ^ + 



" 2M 1=1 



Remark 2. Using the same argument as used by Telatar in pUj , we can express this capacity alternatively 
as 

max / :,logdet(^lM + F,F;,QF^F:jd/, (31) 



where Fg = (F^F 




{Q(/)}eQj -1^2 



2M 



fs. 
2M 



Q(/) : I/I < Q(/) e S+ j> : Tr (Q(/)) df ^ P } . (32) 



2M 



The optimal {Q(/)} corresponds to a water-filling power allocation strategy based on the singular 
values of the equivalent channel matrix FsF/^, where is associated with the original channel and F^ 



arises from prefiltering and noise whitening. For each / G [—fs/2M, /s/2M], the integrand in (31 ) can 
be interpreted as a MIMO Gaussian channel capacity formula with degrees of freedom associated with 
the frequency domain, as illustrated in Fig. [TOja). We still have full control over a countable number of 
input branches |-^(^/ — |/gz|, but this time we have M receive branches instead of a single 
branch (as in the MISO case for sampling following a single filter). The channel capacity can be achieved 
when the transmit signals are designed to decouple this MIMO channel into M parallel channels (and 
hence M degrees of freedom), each corresponding to one of its singular directions. Unlike traditional 
MIMO Gaussian channels, the noise samples in each output sample set {yi[n] : I < i < M} result from 
the same process 77 (t) (as shown in Fig. [Sja)) and hence noise samples are correlated. 



B. Approximate Analysis 

The sampled analog channel under filter banks followed by sampling can be studied through its 
connection with MIMO Gaussian channels (see Fig. [TO]). Consider first a single frequency / G 
[—fs/2M,fs/2M]. Since we employ a bank of filters with each filter followed by an ideal uniform 
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sampler, the equivalent channel has M receive branches, each corresponding to one branch of prefiltered 
sampling at a rate fs/M. The noise received in the zth branch is zero-mean Gaussian with spectral density 



E 



Siif 



Ifs 



M 



M 



2M -■' - 2M J 



_ ^ (-;^</<;^) (33) 

l=—oo 

indicating the mutual correlation of noise at different branches. The received noise vector can be whitened 
by multiplying Y(/) = [•••, Y{f), Y{f - /,), ■■■f by an MxM whitening matrix (F,(/)F*(/))-'. 
Since whitening here is an invertible operation, it preserves capacity. After whitening, the channel of Fig. 
TOja) associated with frequency / has the following channel matrix 



(F,(/)F:(/))-^ F,(/)F^(/) = F.(/)F,(/). (34) 
MIMO Gaussian channel capacity results |[T0| immediately imply that the channel capacity at a given 



frequency / e [—fs/2M, fs/2M] corresponding to the channel in Fig. 10 'a) can be expressed as 



max - log det 
Q 2 ^ 



i + f,(/)f,(/)q(/)fu/)f:(/) 



(35) 



subject to the constraints trace (Q(/)) < P{f) and Q(/) G S+, where Q(/) denotes the power allocation 
matrix. Ranging over all / G [—fs/2M,fs/2M], we have the set of parallel MIMO channels for each 
frequency / illustrated in Fig. [Tojb), where each MIMO channel in this figure is equivalent to the set of 
M parallel channels in Fig. [TOja) for the given frequency. Performing the water-filling power allocation 
strategy across all parallel channels leads to our capacity expression. 
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Figure 10. Equivalent representations for a bank of M filters followed by sampling: (a) Equivalent MIMO Gaussian channel 
for a single / G [— /s/2M, /s/2M]; (b) An equivalent set of parallel channels representing all / G [— /s/2M, /s/2M], where 
the MIMO channel at a given frequency / is equivalent to the MIMO channel of Fig. [Tofa). 
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C. Optimal Filter Bank 

1) Derivation of optimal banks of filters: In general, logdet Im + FsF/zQF^Fg is not perfectly 
determined by Fs{f) and F/^(/) at a single frequency /, but also depends on the water-level due to the 
fact that the optimal power allocation strategy relies on the power constraint P/cr'^ as well as F^ and 
across all /. In other words, logdet Im + F^F/^QF^F* is a function of all singular values of F^F/j 
and the universal water-level associated with optimal power allocation. Given two sets of singular values, 
we cannot determine which set is preferable without accounting for the water-level, unless one set is 
element-wise larger than the other. That said, if there exists a prefilter that maximizes all singular values 
simultaneously, then this prefilter will be universally optimal regardless of the water-level. Fortunately, 
such optimal schemes exist, as we characterize in Corollary [2j 

Since F/j(/) is a diagonal matrix, Xj^ (F/^F^) denotes the kth largest entry of F/^F^ or, equivalently. 



the kth largest element in the set 
as follows. 



: I e Z > . The optimal filter bank can then be given 



Corollary 2. Consider the system shown in Fig. ^ Suppose that for each aliased set 



and each k {1 < k < M), there exists an integer I such that 

2 



element in 



is equal to the k^^ largest 



: i ^ Zy The capacity (|57|) under filter-bank sampling can be maximized 
by a bank of filters for which the frequency response of the k^^ filter is given by 

" = A, (F,(/)F* (/)) ; 

(36) 
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for all I el, l<k<M and f G 

C{fs) = 
where v is chosen such that 
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1, 

0, otherwise^ 
~2M' 2m]- resulting maximum channel capacity is given as 

fs/2M 
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5](log(i/-Afc (F,FD))+d/, 

-fs/2M 



(37) 



/ 



f,/2M M 

Y,\^-\k (F,F^)]+d/ = P. 

fs/2M 



(38) 



Here, we use the notation (x)^ to denote max{x,0}. 
Proof: See Appendix |Dj 



The choice of prefilters in ([36]) achieves the upper bounds on all singular values, and is hence universally 
optimal regardless of the water level. Since F^ has orthonormal rows, it acts as an orthogonal projection 



27 



and outputs the M-dimensional subspace closest to the channel space spanned by F/^. By observing that 
the rows of the diagonal matrix are orthogonal to each other, the subspace with the strongest signal 
strength corresponds to the M rows of that contain the highest channel gain out of the entire aliased 
frequency set |/ — ^ | / G z|. Hence, the maximum data rate is achieved when the filter bank outputs 
M frequencies with the highest SNR among the set of frequencies equivalent modulo ^ and suppresses 
noise from all other branches. Define the aliased frequency set B,if) = {f + + If. \ I e z}. Then 
sampling following a single optimal filter selects the strongest frequency from each set Bk{f) (0 < k < 
M), while sampling following an optimal filter bank selects the M largest points from Bo{f) U Bi(f) U 
• • • U Bm-iU), and hence outperforms a single-branch with a filter followed by sampling. 

To illustrate this, consider the example given in Fig. [TT} where we compare sampling following a 
single filter and sampling following two filters, with optimal filters designed in each case. Consider two 
aliased sets 0i = {/ — //s : / G Z} and B2 = | / + ^ — //^ : / G z|. Single-branch sampling extracts 
out the frequency with the best SNR from Bi and another one from B2, while two-branch sampling can 
select the two frequencies with the best SNRs from B1UB2. In the example shown in Fig. [11} the latter 
corresponds to selecting two frequencies from Bi, which strictly outperforms sampling following a single 
branch. 

D. Discussion and Numerical Examples 

We note that in a monotone channel, the optimal filter bank will sequentially crop out the M best 
frequency bands, each of bandwidth fs/M. Concatenating all of these frequency bands results in a low- 
pass filter with cut-off frequency /s/2, which is equivalent to single-branch sampling with an optimal 
filter. In other words, using filter banks harvests no gain in capacity compared to a single branch with 
a filter followed by sampling. In this case, the sampled capacity with the optimal filter bank increases 
monotonically with the sampling rate up to the Nyquist-rate capacity. 

For more general channels, however, the capacity is not a monotone function of /g. Consider again the 
multiband sparse channel where the channel response is concentrated in two sub-intervals, as illustrated 
in Fig. [9ja). As discussed above, sampling following a single filter only allows us to select the best / 
out of the set {/ — //^ : / G Z}, while sampling following filter banks allows us to select the best / 
out of the set | / — G z|. For example, when many frequencies in the set {f — Ifs I ^ Z} have 

higher channel gain than all points in the set | / + ^ — //^ : / G z|, filter-bank sampling allows these 
desirable frequencies to be used for multiple branches. In the single-filter sampling, however, at most 
one from each set can be effectively used. Thus, the sampled channel capacity with a filter bank exceeds 



28 



aliased set 1 : Fi = g Z} 

aliased set 2: F2 ^ {f ^ ^ - Ifs :leZ} 



best fin Fi 



6 I I I 6 a i I 



best fin Fo 



-A — e — A — 0- 



-O A A Ah 



/ + 



1st filter 



best two f in Fi u F2 



-A O A ©- 

f-fs f-^ 



2^^^ filter 
A- 



/ + 



Figure 11. Sampling following a single filter v.s. sampling following two filters. The blue solid lines represent the SNRs at 
frequencies in the aliased frequency set {/ — //s : / G Z}, while the red dotted lines represent the SNRs at frequencies in the 
aliased set {f + ^ — Ifs : / G Z}. Sampling following an optimal prefilter must select the frequency with the best SNR from 
each aliased set separately, while sampling following 2 filters can simultaneously select two frequencies from the same aliased 
set, thus strictly outperforming sampling following a single filter. 



that of with a single filter, but neither capacity is monotonically increasing in fg. This is shown in Fig. 
[9jb). Specifically, we see in this figure that when we apply a bank of two filters prior to sampling, the 
capacity curve is still non-monotonic but outperforms a single filter followed by sampling. 

Another consequence of our results is that when the number of prefilters is appropriately chosen, the 
Nyquist-rate channel capacity can be achieved by sampling at any rate above the Landau rate. In order to 
show this, we introduce the following notion of a channel permutation. We call H{f) a permutation of a 
channel response H{f) at rate fs if, for any /, |^(/ - Ifs) : I e = {H{f - Ifs) : / G Z}. In other 
words, [••• ,^(/-/,),^(/),^(/ + /,),---]isapermutation of [• • • ^ H{f - f^), H{f), H{f + fs)r • •] 
for every /. The following proposition characterizes a sufficient condition that allows the Nyquist-rate 
channel capacity to be achieved at any sampling rate above the Landau rate. This is an immediate 
consequence of the data processing inequality which implies that permutation of the channel response at 
rate fs/M preserves capacity. 
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Proposition 1. If there exists a permutation H{f) of H{f) at rate ^ such that the support of H{f) 



IS 

fs>fL 



2 ' 2 



then optimal sampling following a bank of M filters achieves Nyquist-rate capacity when 



Examples of channels satisfying Proposition [T] include any multiband channel with subbands among 
which K subbands have non-zero channel gain. For any fg>f]^^ ^/nyq, we are always able to 
permutate the channel at rate fs/K to generate a band-limited channel of bandwidth f^. Hence, sampling 
above the Landau rate following K filters achieves the Nyquist-rate channel capacity. This is illustrated 
in Fig. |9] where a four-branch filter bank followed by sampling has a higher capacity than that with a 
single prefilter followed by sampling, and achieves the Nyquist-rate capacity whenever fg > §/nyq- 

V. Modulation and Filter Banks Followed by Sampling 

A. Main Results 

We now treat modulation and filter banks followed by sampling. The Fourier transform of each of the 
periodic modulation sequences qi{t) is a delta train with spacing equal to the inverse period l/Tq. Since 
multiplication in the time domain corresponds to convolution in the spectral domain, the modulation 
bank mixes frequency components among different aliased sets. This is reflected in Theorem |4] that 
characterizes the sampled analog channel capacity. 

Assume that Tg := MTg = ^Tq where a and b are coprime integers, and that the Fourier transform 
of qi{t) is given as J2i ^i^if ~ Uq)- Before stating our theorem, we introduce the following two Fourier 
symbol matrices and F^. The aM x oc-dimensional matrix F^ contains M submatrices with the 
ath submatrix given by an a x oc-dimensional matrix F^F^. Here, for any v e 1 < I < a, and 
1 < a < M, we have 

Also, F^ and F^ are infinite diagonal matrices such that for all I ^ Z 



Theorem 4. Consider the system shown in Fig. |?] Assume that h{t), Pi{t) and Si{t) (1 < i < M) are 
all continuous, bounded and absolutely Riemann integrable, F^ is right invertible, and that the Fourier 
transform ofqi{t) is given as c\S{f — Ifq). Additionally, suppose that hr^{t) [li (/) / \/ Sj^ (/)) 
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satisfies hj^{t) — o{t ^) for some constant e > 1. We also assume that aMTg — bTq where a and b are 
coprime integers. The capacity C{fs) of the sampled channel with a power constraint P is given by 



_Is_ , aM 



C{fs) = ^ J2 [l^S (^A, (^(F^F^*)-^ F^F^F^*F^* (F^F^*)"'))] ^ d/, (39) 

~ 2aM 1=1 

where v is chosen such that 

d/. (40) 



P = J'^"^ ^ J^y _ (^(F^F^*)"2 F^F^F^*F^* (F^F^ 



I I v-^ y - J. J. J. J. v-^ 

' 2aM i=l 

Remark 3. The right invertibility of F^ ensures that the sampling method is non-degenerate, e.g. the 
modulation sequence cannot be zero. 

The optimal v corresponds to a water-filling power allocation strategy based on the singular values 
of the equivalent channel matrix (F^F^*)~2 f^F^, where (F^F^*)~2 is due to noise prewhitening and 
F^F^ is the equivalent channel matrix after modulation and filtering. This result can again be interpreted 
by viewing ([39]) as the MIMO Gaussian channel capacity of the equivalent channel matrix. We note that 
a closed-form capacity expression may be hard to obtain for general modulated sequences qi{t). That is 
because the multiplication operation corresponds to convolution in the frequency domain which does not 
preserve Toeplitz properties of the original operator associated with the channel filter. However, when 
qi{t) is periodic, it can be mapped to a spike train in the frequency domain, which still exhibits block 
Toeplitz properties, as we describe in more detail in Appendix [C] 

B. Approximate Analysis 

The Fourier transform of the signal prior to modulation in the zth branch at a given frequency / can 
be expressed as Pi{f) {H{f)X{f) + N(f)). Multiplication of this prefiltered signal with the modulation 
sequence qi{t) corresponds to convolution in the frequency domain. The modulation sequence qi{t) is 
assumed to be periodic with frequency response J2i ^{^ if ~ Ug)- Define 

R{f)^H{f)X{f) + N{f). 

The channel output is sampled at a rate = /^/M in the zth branch. We observe that Tq does not 
coincide with Tg := MTg, and that the sampling system is periodic with period bTq = aTg. Specifically, 
if we denote by h{t,T) the output of the sampling system at time t due to an input at time r, then 
h{t — bTq, T — bTq) = /i(t, r). We therefore divide all samples in the ith branch into a groups, where the 



/th (0 < / < a) group contains {yi[l + ka] \ k e Z}, as illustrated in Fig. 12 ^a). Hence, each group is 



sampled uniformly at rate fq/b. The sampling system, when restricted to the output on each group of the 
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sampling set, can be treated as LTI, thus justifying its equivalent representation in the spectral domain. 
For the ith branch, denote by 

gljiti := J Si{t - ri)qi{ri)p{ri - r)dri 

the output response of the preprocessing system at time t due to an input impulse at time r. Then, the 
equivalent impulse response of the sampling system for the Ith group is given by gi{t) := g\^{lTs^ ITg — t). 
Thus, the equivalent Fourier transform of the system output before ideal sampling in the Ith group of the 
zth branch can be written as 

^m)Rif) (^Siif) exp [j27rflt) * c^S (/ - uf,)^ 
^PiimU) E ^r-Si (/ - uU) exp (i27r;r, (/ - ufS) , 

u 

which further leads to the sampled sequence in the /th group of the ith branch as 

^/(/)-E^/(/-x) 

= E p. (/ - ^) « (/ - f ) E (/ -A - ^) (^-^''ft (/-«/.- ^) ) 

where 

M^, : = E cr^i (/ - - ^) exp (^i27rlf, - - ^ 

Fig. [12] illustrates this representation for sampling with a single branch of modulation and filtering when 
fs = 2/g. All the information of the entire sampled data is contained in |0</<a, l<z< M}, 

and hence the sampling system can be equivalently represented as a MIMO channel with an infinite 
number of input branches and aM output branch. 

Due to the convolution operation in the spectral domain, the frequency response of the sampled output at 
frequency / becomes a linear combination of frequency components {X(/)} and from several 

different aliased frequency sets. Here, we introduce the definition of a modulated aliased frequency 
set as a generalization of the aliased set. Specifically, for each /, its modulated aliased set is the se|^ 

^We note that although each modulated aliased set is countable, it may be a dense set when /g//s is irrational. Under the 
assumption in Theorem |4j however, the elements in the set have a minimum spacing of fqjh. 
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|/ ~ ^/g ~ ^/s I ^ By our assumption that fq = ^fs with a and b being relatively prime, simple 
results in number theory imply that 



{/o - //, - kfs I /, A: G Z} = |/o - I / G z| = I /o - I / G Z 



(41) 



In other words, for a given /o G — 1|, || , the sampled output at /o depends on the input in the entire 
modulated aliased set. Since the sampling bandwidth at each branch is fs, all outputs at frequencies 
|/o — ^"^I^^Z; — ^</o — / y<^| rely on the inputs in the same modulated aliased set. This 
can be treated as a Gaussian MIMO channel with a countable number of input branches at the frequency 
set |/o — / ^ I / G z| and aM groups of output branches, each associated with one group of sample 



sequences in one branch. As an example, we illustrate in Fig. 12 the equivalent MIMO Gaussian channel 



under sampling following a single branch of modulation and filtering, when S{f) = for all / ^ 

2 ' 2 • 

The effective frequencies of this frequency- selective MIMO Gaussian channel range from — 1| to ||, 
which gives us a set of parallel channels each representing a single frequency /. The water- filling power 
allocation strategy is then applied to achieve capacity. 

The rigorous analysis of Theorem |4] based on Toeplitz properties is deferred to Appendix |C] 

C. An Upper Bound on Sampled Capacity 



Following the same analysis of optimal filter-bank sampling as in Section [IV-C[ we can derive an upper 
bound on the sampled channel capacity, as characterized in Corollary [3] 

Corollary 3. Consider the system shown in Fig. |?] Suppose that for each aliased set {f — ifq/b \ i G Z} 
and each k {1 < k < aM), there exists an integer I such that \H (/ — lfq/b)f is equal to the k^^ largest 
element in (/ — ifq/b)\^ \ i G z|. The capacity {39) under sampling following modulation and filter 
banks can be upper bounded by 



1 rfq/'^h 

Cifs) ^7^1 E (log • i^hn)))+ d/, (42) 



where v is chosen such that 



I 



fj2b ^ 

J2[''-hiFhK)]+df = P. (43) 

/./2n=i 



Proof: By observing that (FT''*) 2 has orthonormal rows, we can immediately derive the result 
using Proposition |5] ■ 
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Figure 12. (a) Grouping of the sampling set when fs = 3/g. The sampling grid is divided into 3 groups, where each group 
forms a uniform set with rate /s/3; (b) Equivalent MIMO Gaussian channel for a given / G [O, ^) under sampling following 
a single branch of modulation and filtering, where /g = |/s ; (c) An equivalent set of parallel MIMO channels representing 
all / G [— /g/26, fq/2b], where the MIMO channel at a given frequency is equivalent to the MIMO channel of Fig. 12 a). 
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It can be observed that this upper bound coincides with the upper bound on sampled capacity under 
aM-branch filter-bank sampling. This basically implies that for a given sampling rate fg, modulation and 
filter bank sampling do not outperform filter-bank sampling in maximizing sampled channel capacity. In 
other words, we can always achieve the same performance by adding more branches in the filter-bank 
sampling. 

We caution that this upper bound may not be tight for many scenarios, since we restrict our analysis 
framework to periodic modulation sequences. A general optimal modulation has not been identified in 
this work, which is left for future investigation. 



D. Single-branch Sampling with Modulation and Filtering v.s. Filter-bank Sampling 

Although the class of modulation and filter bank sampling does not provide capacity gain compared with 
filter-bank sampling, it may potentially provide implementation advantages, depending on the modulation 
period Tq. We consider here two special cases of single-branch modulation sampling, and investigate 
whether any hardware benefit can be harvested. 

^) ^fs = ^fq fo^ some integer a: In this case, the modulated aliased set is 
|/ — — Ifq \ ^ = {-^ ~ I ^ ^ ^'^i^'^ equivalent to the original 

aliased frequency set. That said, the sampled output Y (/) is still a linear combination of 
— fc^^X^/ — fc^^+A^^/ — A:^^ |A:Gz|. But since linear combinations of these 
components can be attained by simply adjusting the prefilter response S{f), the modulation bank does 
not provide any more design degrees of freedom. Therefore, the maximum sampled channel capacity 
achievable by adding an additional modulation bank is no larger than the one achievable without the 
modulation sequences. 

^) Jifs = bfq for some integer b: In this case, the modulated aliased set is enlarged to 
|/ — fc^ — //g|A:,/Gz| = {f — Ifq \ k e Z}, which may potentially provide implementation gain 
compared with filter-bank sampling with the same number of branches. We consider the following 
example. Suppose that the channel contains 3 flat subbands with channel gains as plotted in Fig. [13] 
and that the noise is of unit spectral density within these 3 subbands and zero otherwise. Here, single- 
branch filtering followed by sampling is employed, where the sampling rate is = 2 and the period 
of the modulation sequence Tq = 2Ts. Due to aliasing, Subband 1 and Subband 3 (as illustrated in Fig. 



13) are mixed together. According to Section III-D the optimal prefilter without modulation would be a 



band-pass filter with passband —1.5 < f < 0.5, resulting in a channel consisting of two subbands with 
respective channel gains 2 and 1. 
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Figure 13. The left plot illustrates the channel gain, where the sampling rate is /s = 2. The right plot illustrates the signal 
components of the sampled response under modulation followed by sampling. 



Now if we employ modulation sampling with a lowpass filter, the channel structure can be better 
exploited. Specifically, suppose that the modulation sequence has a period of 2Ts and obeys = 1, 

= 100, = 1, = 1000 and = for all other i's, and that the cutoff frequency of the low-pass 
filter is /cutoff = 1. By simple manipulation, 

X{f) + N{f) 



exp(j7rr,(/-^))y(/-£^) 
exp(j7rrj)y(/) 



2 100 1001 
100 1 



2X(f 



L 



+ N^f 



+ 



for all / G 



^ 



Through noise whitening and eigenvalue decomposition, we can derive a pair of 
equivalent parallel channels experiencing respective channel gains 2 and 1.99, which outperforms non- 



modulated sampling following optimal filtering. As illustrated in Fig. 



13 



primarily depends 



on the frequency component at / + ^, while Y (/) primarily depends on the frequency component at 
f — Y' ^^^^ frequencies have SNR 4. In fact, by increasing and correspondingly, we can obtain 
a two-subband channel with respective channel gains both arbitrarily close to 2. That said, sampling 
following a single branch of modulation can achieve the same capacity as applying the optimal filter 
bank. 

More generally, let us consider the following scenario. Suppose that the channel of bandwidth W = 
^fs is equally divided into 2L subbands each of bandwidth = -^/^ for some integers K and L. The 

\H( f)\^ 

SNR ^ (^f^ within each subband is assumed to be flat. For instance, in the presence of white noise, if 
fq <C Be with Be being the coherence bandwidth ||48|, the channel gain (and hence the SNR) is roughly 
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equal across the subband. Take any / G 
the modulation sequence. 



2 ' 2 



, and run the following simple algorithm to determine 



Algorithm 1 

1. Initialize. Find the K largest elements in | ^^^{j-if) :/gZ,-L<Z<L-i|. Denote 
by {/i : 1 < i < K} the index set of these K elements such that h > I2 > " - > Ik- Set i = 1, 
^max = —L, J' = 0, and = for all i G Z. Let A be a large given number. 

2. For i = 1 : 

For m = /max : /max + K -1 

if (m mod K) ^ J', do 

= U {m mod K}, 4 = m, J^ax = m + L-l-k, d^'^' = A^^^'' 

and break; 

3. For i —L : L — 1 

if i ^ {/i, • • • JkI then S {f + ifp) = 1; 
else5(/ + i/g) = 0. 

The goal of this algorithm is to generate K subbands with high SNR. Due to convolution, the signal 
in each subband is a linear combination of the frequency components in all frequencies in the modulated 
aliased set. Adjusting the values of {c* : i G Z} results in different weights for each component. Here, 
the signal in each subband being selected through Step 2 will contain one primary component accounting 
for most of the power of the entire signal. Filtering is further used in Step 3 in order to suppress aliasing. 
The performance of this algorithm is characterized in the following proposition. 

Proposition 2. Consider the piecewise flat channel with 2L subbands as described above. For a given 
fq, the modulation sequence found by Algorithm 1 maximizes capacity when A^oo. 

In fact, the performance of this algorithm is asymptotically equivalent to the one using an optimal 
filter bank followed by sampling with sampling rate fq at each branch. Hence, single-branch sampling 
effectively achieves the same performance as multi-branch filter-bank sampling. This is in general 
the preferred approach since building multiple analog filters is more expensive (in terms of power 
consumption, size, or cost) than a single modulator. We note, however, that for a given overall sampling 
rate, modulation-bank sampling does not outperform filter-bank sampling in terms of sampled capacity, 
which is formally stated as follows. 
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Proposition 3. Consider the setup in Theorem |?] For a given overall sampling rate fs, sampling with 
M branches of optimal modulation and filter banks does not achieve higher sampled capacity compared 
to sampling with an optimal bank of aM filters. 

Hence, the main advantage of applying a modulation bank is a hardware benefit, namely, using fewer 
branches and hence less analog circuitry to achieve the same capacity. 



VI. Connections between Capacity and MMSE 



In Section |III-D| and Section |IV-C[ we derived respectively the optimal prefilter and the optimal filter 
bank that maximize capacity. It turns out that such choices of sampling methods coincide with the optimal 
prefilter / filter bank that minimize the MSE between the channel input and the signal reconstructed from 
sampling the channel output, as detailed below. 

The sampling problem we consider can be formally stated as follows. For a given analog channel, 
suppose that the channel input x{t) is any zero-mean WSS stochastic signal whose power spectral density 
(PSD) Sx{f) satisfies a power constraint Sx{f)df = P. This input is passed through the channel 
where it is filtered by the channel frequency response and contaminated by stationary Gaussian noise. We 
sample the channel output using a filter bank at a fixed rate fs/M in each branch, and recover an MMSE 
estimate x{t) of x{t) from its samples in the sense of minimizing E (^\x(t) — for t G R. Since 

the samples {y[n]} are Gaussian random variables, the MMSE estimate x{t) for a given input process 



x{t) is linear in {y[n]} |50|. We propose to jointly optimize x{t) and the sampling method. Specifically, 
our joint optimization problem can now be posed as follows: for which input process x{t) and for which 
filter bank is the estimation error K (\x{t) — x(t)\^) minimized for t G M. 



In this joint optimization, it turns out that the optimal input and the optimal filter bank coincide with 
those maximizing channel capacity, which is captured in the following proposition. 

Proposition 4. Suppose the channel input x(t) is any WSS signal. For a given sampling system, let 
x{t) denote the corresponding optimal linear estimate of x{t) from the digital sequence {y[^]}. Then 
the optimal filter bank given in ([j6|) and its corresponding optimal input x{t) minimizes the MSE 



reconstruction error E ( — ) over all possible LTI filter banks. 



^Instead of giving an analysis that accommodates a general class of signals, we restrict our attention to wide-sense stationary 
(WSS) Gaussian input signals. This restriction, while falling short of generality, allows us to derive informative sampling results 
in a simple way. 
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Proof: See Appendix |F| 

■ 

Proposition |4] implies that the input signal and the filter bank optimizing channel capacity also minimize 
the MSE between the original input signal and its reconstructed output. Thus, under sampling with filter- 
banks, information theory reconciles with sampling theory through the SNR metric when determining 
optimal systems. Intuitively, high SNR typically leads to large capacity and low MSE. 

Proposition]?] includes the optimal prefilter under single-prefilter sampling as a special case. That said, 
the optimal prefilter puts all its mass in the frequency with the highest SNR in each aliased frequency 
set, and thus suppresses aliasing, which coincides with the capacity-optimizing filter derived in Section 



III-D[ We note that a similar MSE minimization problem was investigated decades ago with applications 
in pulse amplitude modulation (PAM) [ |40| , ||4T|: a given random input x{t) is prefiltered, corrupted by 
noise, uniformly sampled, and then postfiltered to yield a linear estimate x{t)\ the goal in that work was 
to minimize the MSE between x{t) and x{t) over all prefiltering (or pulse shaping) and postfiltering 
mechanisms. While our problem differs from this PAM design problem by optimizing directly over the 
random input instead of the pulse shape, the two problems are similar in spirit and result in the same alias- 
suppressing filter. However, earlier work did not account for filter-bank sampling or make connections 
between minimizing MSE and maximizing capacity, as we do in Proposition |4j 

VII. Conclusions and Future Work 

We characterize sampled channel capacity as a function of sampling rate for different sampling 
mechanisms, thereby forging a new connection between sampling theory and information theory. We 
show that the capacity of a sampled analog channel degrades with reduced sampling rate and identify 
optimal sampling structures for several classes of sampling methods, which exploits structure in the 
sampling design. These results also indicate that capacity was not always monotonic in sampling rate, 
and illuminate an intriguing connection between MIMO channel capacity and capacity of undersampled 
analog channels. Our work establishes a framework for using the information- theoretic metric of capacity 
to optimize sampling structure, offering a different angle from traditional design of sampling methods 
based on other statistical measures such as MSE. 

Our work also opens more questions at the intersection of sampling theory and information theory. 
For instance, an upper bound on sampled capacity under sampling rate constraints for more general 
nonuniform sampling methods would allow us to evaluate which sampling mechanisms are capacity- 
achieving for any channel. Moreover, for channels where there is a gap between achievable rates and 
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the capacity upper bound, these results might provide insight into new sampling mechanisms that might 
achieve or at least close the gap to capacity. Investigation of capacity under more general nonuniform 
sampling techniques is a topic of future work. In addition, the optimal sampling structure for time-varying 
channels will require different analysis than used in the time-invariant case, and it remains to be seen 
what sampling mechanisms are optimal for channels when the channel state is partially or fully unknown. 
A deeper understanding of how to exploit the channel structure may also guide the design of sampling 
mechanisms for multiuser channels that require more sophisticated cooperation schemes among users and 
are impacted in a more complex way by subsampling. 



Appendix A 
Proof of Theorem[2] 

We provide first an outline of the proof. A discretization argument is first used to approximate arbitrarily 
well the analog signals by discrete-time signals, which allows us to make use of the properties of Toeplitz 
matrices instead of the more general Toeplitz operators. By noise whitening, we effectively convert the 
sampled channel to a MIMO channel with i.i.d. noise for any finite time interval. Finally, the asymptotic 
properties of Toeplitz matrices are exploited in order to relate the eigenvalue distribution of the equivalent 
channel matrix with the Fourier representation of both channel filters and prefilters. The proofs of a couple 
of auxiliary lemmas are deferred to Appendix [Hj 

Instead of directly proving Theorem [2} we prove the theorem for a simpler scenario where the noise 
r]{t) is of unit spectral density. In this case, our goal is to prove that the capacity is equivalent to 

/ E \H{f-lfs)S{f-lfs)\'\ 



C{fs) = 



/»/2 
-fs/2 

fs/2 
-fs/2 



log 



v 



E \sif-ifs 



J 



df 



log u 



\\yHsif,f. 



sjwl 



df, 



l|V5(/,/.)||^ , 

where the water level u can be calculated through the following equation: 



(44) 



W^HsifJs 



-/./2I ||V5(/,/.)||^ 



df 



This capacity result under white noise can then be immediately extended to accommodate for colored 
noise. Suppose the additive noise is of power spectral density 5ry(/). We can then split the channel filter 
H (/) into two parts with respective frequency response H (/) / \/S^I{f) and yJSr^{f). Since the colored 
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noise is equivalent to a white Gaussian noise passed through a filter with transfer function \/Sr^{f), the 



original system can be redrawn as in Fig. [T4j This equivalent representation immediately leads to the 
capacity in the presence of colored noise by plugging in corresponding terms in ( |44| ). 
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Figure 14. Equivalent representation of filtering followed by sampling in the presence of colored noise. 



A. Channel Discretization and Diagonalization 

Similar to the analysis for non-filtered ideal sampling, we set T = nTs and = /cA. For notational 



simplicity, we define 



1 



for any function g{t). If g{t) is a continuous function, then we have Yvm/^^ogu^v = 9 {uTg — vA). 



We also define h{t) := h{t) * s{t). Set T = nTs and -- 
let = A • hi^Q, hi^i, • • • , hi^k-i and = A • [5^,0, ^^,1, • 
the transmit signal vector and noise vector 77 as (x^)^ = 
(77)^ = ^ /o^ ^ + ^) {i ^^). We also introduce 



/cA for some integers n and A:, and 
, Si^k-i] be 1 X A: vectors. We define 
^ X (z A + r) dr (0 < z < nk) and 



ho 
hi 
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ho 
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With these definitions, the original channel model can be approximated with the following discretized 
channel: 



(45) 



As can be seen, is a fat block Toeplitz matrix. Moreover, S^S^* is asymptotically equivalent to a 
Toeplitz matrix, as will be shown later. We note that each element r]i is a zero-mean Gaussian variable 
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with variance 



r-A /-A 

1 

A' 



and that E (ryir/;*) = for any i ^ I, thus implying that rj is an i.i.d. Gaussian vector with each entry 
having variance 1/A. The filtered noise S"?? is no longer i.i.d. Gaussian, and hence we first attempt to 
whiten the noise. 

Setting = (S"S"*)"^ S", we see that 

ES"r? (S"r?y = S"E (w*) S"* = S"S"* = (S^S"*)"^ S"S"* (S^S"*)"^ = ^F. (46) 

This basically implies that S" projects the i.i.d. Gaussian noise rj onto an orthogonal n dimensional 
subspace, i.e. (S"S"*)~2 S^ry" is still i.i.d. Gaussian noise. Applying this whitening operation, we get 

: = (S"S'^*)-^y" (47) 
= (S"S"*)-^ H"x" + (S"S"*)-^ S"r?" (48) 
= (S"S"*)-^H"x" + r?". (49) 

Here, r/" consists of independent zero-mean Gaussian elements with variance 1/A. Since is of full 
row rank, the data processing inequality immediately yields that 

/(x-,r) = /(x-,y-). (50) 

Consequently, we can express the capacity of the sampled analog channel as the following limit: 

C(/,)= lim lim ^ sup /(x",y") 

= lim lim ^ sup (x^,y^). 

We note that even when there exists no integer n such that T = nTg, the capacity can be bounded through 
the following fact. Since the proof of this lemma is straightforward, we defer it to Appendix |H- A 



Lemma 1. Suppose the following limit 



lim 4rsup/(x(0,nr,];{y[n]}) (51) 
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exists. Then we have 

lim ;^sup/(x(0,T];{y[n]}) = lim ^ sup I (x(0, nT,]; {y[n]}) . (52) 
Hence, it suffices to investigate the case when T is integer multiples of Tg. 

B. Preliminaries on Toeplitz Matrices 

Before proceeding to the proof of the theorem, we briefly introduce several basic definitions and 
properties related to Toeplitz matrices. Interested readers are referred to Q, pT| for more details. 

A Toeplitz matrix is an n x n matrix where (T^)^ ^ = tk-i, which implies that a Toeplitz matrix 
is uniquely defined by the sequence {tk}. A special case of Toeplitz matrices is circulant matrices 
where every row of the matrix is a right cyclic shift of the row above it. The Fourier series (or 
symbol) with respect to the sequence of Toeplitz matrices {T^ := [tk-i] fc, / = 0, 1, • • • , n — 1] : n G Z} 
is given by 

= tk exp (jkcj) , cj G [-tt, tt] . (53) 

k=—oo 

Since the sequence {tk} uniquely determines F(u) and vice versa, we denote by T^(F) the Toeplitz 
matrix generated by F (and hence {tk}). We also define a related circulant matrix C^(F) with top row 
(^c^^\c^l'\--- ,c2i), where 

4"^--E^f— lexpf^V (54) 
One key concept in our proof is asymptotic equivalence, which is formally defined as follows fSTl. 

Definition 1 (Asymptotic Equivalence). Two sequences of n x n matrices {A^} and {B^} are said to 
be asymptotically equivalent if 

(1) A^ and are uniformly bounded, i.e. there exists a constant c independent of n such that 

||A^||2, ||B^||2 < c < oc, n = 1,2,- • • (55) 

(2) lim,^oo;^||A--B-||p = 0. 

We will abbreviate asymptotic equivalence of {A^} and {B^} by A^ ^ B^. Two important results 
regarding asymptotic equivalence are given in the following lemmas [51 1. 
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Lemma 2. Suppose ^ with eigenvalues {a^^k} cind {/3n,/c}. respectively. Let g{x) be an arbitrary 
continuous function. Then if the limits lim^^oo ^ X^^^o 9 i^n.k) lim^^oo ^ J2k=o 9 (^n,k) exist, we 
have 

^ n—l ^ n—1 

lim -y^Q = lim - ^ (/3n,/c) • (56) 

k=0 k=0 

Lemma 3. (a) Suppose a sequence of Toeplitz matrices where (T^)^j — U-j satisfies that {U} is 
absolutely summable. Suppose the Fourier series F{uj) related to is positive and is Hermitian. 
Then we have 

T^(F) - C^(F). (57) 

If we further assume that there exists a constant e > such that F {uj) > e > for all u E [0, 27r], then 
we have 

T^(F)-i - C^(F)-i = C^(l/F) - (1/F) . (58) 
(b) Suppose - and - D^, then A^C^ - B^D^. 

Toeplitz or block Toeplitz matrices have well-known asymptotic spectral properties ||4|, (52). The notion 
of asymptotic equivalence allows us to approximate non-Toeplitz matrices by Toeplitz matrices, which 
we will use in the next subsection to analyze the spectral properties of the channel matrix. 



C. Capacity via Convergence of the Discrete Model 

To prove the capacity theorem, we first construct an asymptotically equivalent channel matrix and 
obtain its capacity. This requires us to first exploit the asymptotic spectral properties of the Hermitian 
matrices S^S^* and H^H^*. In particular, for any 1 < i < j < n, we have 



oo 



(S'^S-),. = (S"S-)*, = Yl ^J-^+tsl (59) 



t=—oo 

Hence, the Hermitian matrix := g^S^* is still Toeplitz. On the other hand, 

n-j 



jjnjjnA _ (h^h^*)* = h^--z+th*. (60) 

J a \ J ji 



t=-3+l 



Obviously, H^H^* is not a Toeplitz matrix. Instead of investigating the eigenvalue distribution of H^H^* 
directly, we look at a new Hermitian Toeplitz matrix associated with H^H^* such that for any i < j: 



oo 

H")..= (h")*.= V,+A*. (61) 



t=—oo 
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Lemma 4. The above definition of implies that 

-H^H^*. (62) 



Proof: See Appendix |H-B for the detailed derivation. 



Next, we construct the circulant matrix as defined in (54). The following lemma relates (C^) ^ 
with (S^S^*)"\ 

Lemma 5. If there exists some constant > such that for all f ^ —^i^ 

oo 

J2 \S{f-lfs)\^>e,>0 (63) 

l=—oo 

holds, then (C^)"^ - (S^S^*)"\ 



Proof: See Appendix |H-C 



One of the most useful properties of a circulant matrix is that its eigenvectors are 



Suppose the eigenvalue decomposition of is given as 

C" = U,AcU:, (65) 

where Uc is a Fourier coefficient matrix, and Ac is a diagonal matrix where each element in the diagonal 
is positive. 

The concept of asymptotic equivalence allows us to explicitly relate our matrices of interest to both 
circulant matrices and Toeplitz matrices, whose asymptotic spectral densities have been well studied. 

Lemma 6. For any continuous function g{x), we have 



where \i denotes the ith eigenvalue o/(S"S"*)"2 H"H"* (S^S"*)"^. 



Proof: See Appendix |H-D 



We can now prove the capacity result. The standard capacity results for parallel channels |[T] Theorem 
7.5.1] implies that the capacity of the discretized sampled analog channel is given by the parametric 
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equations 



Pnk 

TTa 



i:\i>h'~ 

E 

1 ~ 



A. 



(66) 
(67) 



where {A^} denote the eigenvalues of (S^S^*)~2 jj^H^* (S^S^*)"^^ and u is the water level of the 
optimal power allocation over this discrete model, as can be calculated through ( [67] ). 

By passing to the limit T ^ oo, we can exploit the asymptotic spectral properties of Toeplitz matrices 

as 



lim Ct (jy) = lim ^ l^og [uXi 

v.\{>\lv 



1 /• , / ||Vffs(/,/. ^"' 



2/ - _, l^sH^TTTT^Id/, 



where T {y) ^ \j : "^^^l^f^f > Similarly, (pi can be transformed into 



Pk 

1/A n . 



- E 



V — 



l|V^(/,/.)ll2 ' 

l|Vif^(/,/.)||^ 



(68) 
(69) 



which completes the proof. 



Appendix B 
Proof of Theorem [3] 

We follow similar steps as in the proof of Theorem |2} we approximate the sampled channel using a 
discretized model first, whiten the noise, and then find capacity of the equivalent channel matrix. Due to 
the use of filter banks, the equivalent channel matrix is no longer asymptotically equivalent to a Toeplitz 
matrix, but instead a block- Toeplitz matrix. This motivates us to exploit the asymptotic properties of 
block- Toeplitz matrices. 

A. Channel Discretization and DiagonaUzation 

Let Ts = MTs, and suppose we have T = nTg and Tg = A: A with integers n and k. Similarly, we can 
define 



/i,(t) :=/i(t)*5,(t), and = k(lfs),k(lfs-A),---,k(lfs-{k-l)A 
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We introduce the following two matrices as 



hi h? 



'n— 1 un— 2 



-n+2 



and 



=1 



We also set (x") • ^ ^ x {iA + r) dr (0 < i < nfc), and (r?) • = ^ /q"^ (iA + r) dr (i G Z). Defining 
y" = [2/1 [0] , • ■ ■ ) 2/1 ~ 1] ) 2/2 [0] , • • • , 2/2 [^^ — 1] , • ■ ■ ) Vm [n — 1]]^ leads to the discretized channel model 



1 f^, 



Whitening the noise gives us 



(70) 



^2 



on 



on* 



H 



M 



(71) 



where is i.i.d. Gaussian variable with variance 1/A. We can express capacity of the sampled analog 
channel under filter-bank sampling as the following limit 

C(/,)= lim lim Asup7(x";r), 

k^oo n^oo Mn 

Here, the supremum is taken over all distribution of subject to a power constraint ^||a;ri||2) < P- 



B. Capacity via Convergence of the Discrete Model 
We can see that for any 1 < u^v < m, 

onon* on 

where the Toeplitz matrix S^^ is defined such that for any 1 < i < j < n 

oo 
t=—oo 



(72) 



(73) 
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Let = [S^^*, S^*, • • • , S^]*. Then the Hermitian block Toeplitz matrix 



S" := 



1,1 ^1,2 • • • ^1^ 



M 



•^2,1 ^2,2 



^2,M 



on on on 
^M,l ^M,2 * * * ^M,M 

satisfies = S^S^*. Additionally, we define H^^ {1 <u,v < M) , where 



oo 



(74) 



and we let = 



Hn* "Ljn* XT' 



. The block Toeplitz matrix 



Hn trn 

1,1 ^1,2 

Tj-n -LTn 

■"-2,1 -"-2,2 



Tjn 
* ■'^1,M 



H 



2,M 



trn "H"^ "H"' 
■"■M,l ■"■M,2 * * * 



M,M 



satisfies 



lim --= 



jjn jjnjjn* 



< lim ^ 

F n^oo 



y - 



l<u,v<M 



Trn TrnTrn* 

u.v V 



= 0, 



(75) 



The M X M Fourier symbol matrix Fs(/) associated with S** has elements [Fs {f)]uv given by 
[F.- (/)]„,. (76) 

(11) 



A 2 ^=-1 / 

A 



=f E Su{-f+ifs)s:[-f+ifs). 



(78) 



(79) 



Denote by {T'^ (FJ^)} the sequence of block Toeplitz matrices generated by F~ ^(/), and denote by 
i^i^) (^1' ^2) Toeplitz block of (F^^). It can be verified that 



M / M \ 

E (FJ^) • Sr,,3 ~ tJ J] [FJ^],^^^^ [F,],^,3 = iS[h - h]) , 
h=l \h=l / 



(80) 
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which immediately yields 

(FJ^) - I 



(S"S 



(81) 



Therefore, for any continuous function g{x), 1521 Theorem 5.4] implies that 



nM 



i=l 2M i=l 

Finally, the capacity of parallel channels |1|, which is achieved via water filling power allocation, yields 



1^ M 

2M 



C(f. 



i=l 



' 2M i=l 

1^ ^ M 

2M 1 



- J2 [log (i^Ai ((f,f:)-^ f,f,f;;f: (f,f:)-^ 



where 



' 2M Z=l 



M 

2M 



d/, 



' 2M i=l 

^ M 

2M 



d/ 



' E - ((F.F:)-i f,f,f;;f: (f,f:)-^ 



' 2M i=l 



d/. 



This completes the proof. 



(82) 
(83) 
(84) 

(85) 
(86) 



Appendix C 
Proof of Theorem|4] 

Following similar steps as in the proof of Theorem |3j we approximately convert the sampled channel 
into its discrete counterpart, and calculate the capacity of the discretized channel model after noise 
whitening. We note that the impulse response of the sampled channel is no longer LTI due to the use 
of modulation banks. But the periodicity assumption of the modulation sequences allows us to treat 
the channel matrix as blockwise LTI, which provides a way to exploit the properties of block-Toeplitz 
matrices. 

Again, we give a proof for the scenario where noise is white Gaussian with unit spectral density. The 
capacity expression in the presence of colored noise can immediately be derived by replacing Pi{f) with 
PiiDVW) and H{f) with H{f)/^S;;{f). 
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In the ith branch, the noise component at time t is given by 

Si {t) * {qi{t) • {pi{t) * r]{t))) = J driSi {t - n) j qi (n) pi (n - T2) J] (T2) dT2 



y 5i (t - Ti) (Ti)pi (n - T2) dri^ 7/ (T2) dT2 



T2 

where 



9i{t, T2) = Jsi{t- Ti) (Ti)pi (ri - T2) dri. 

Let T5 = MTg. Our assumption ^Tg = aTg immediately leads to 

(t + af„ T + bTq^ = y*5, + af, - n) (n) (n - r - af,) dn 



Tl 



Jsi{t- Tl) (ri + 6Tg) Pi (ri - r) dri 

j Si{t-Ti) Qi {ri)pi (ri - r) dri = (t, r) 



implying that g^ (t, r) is a block- Toeplitz function. 
Similarly, the signal component 



where 



5, (t) * • (p,(t) * h{t) * X(t))) = J g^{t, T2)x(T2)dT2, 

^i'C^, T2) ^ Jsi{t- Tl) (ri) Pi (ri - T2 - T3) /l(T3)dT3dTi, 



which also satisfies the block- Toeplitz property g^ (t + aT^, r + ^T^^ = g^ (t, r). 

Suppose that T = nT^ and = A:A hold for some integers n and k. We can introduce two matrices 
and such that Vm E Z, < / < n 

Setting = [yi[0], yi[l], • • • ^yi[n — 1]]^ leads to similar discretized approximation as in the proof of 
Theorem |2) as follows 

yr = Gfx" + G,V (87) 
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Here, 77 is a i.i.d. zero-mean Gaussian vector where each entry is of variance 1/A. 

Hence, and are block Toeplitz matrices satisfying (G^)^^^^^^^ = (G^)^^ and 

{G])i^am+ak ^ (^i^)/ m' Using the samc definition of and 77 as in Appendix |b| we can express the 
system equation as 



(88) 



Whitening the noise component yields 



V 



" G? ■ 




" G? " 




2 


G^^ 


G^ 




G^ 






G^ 








) 




_ _ 



(89) 



where fj is i.i.d. Gaussian noise with variance 1/A. 

In order to calculate the capacity limit, we need to investigate the Fourier symbols associated with 
these block Toeplitz matrices. 



Lemma 7. At a given frequency f, the Fourier symbol with respect to G^G^ is given by aA:F^F^F^*F^*, 
and the Fourier symbol with respect to G^G|^ is given by a/cF^F^F^F^*F^*F^*. Here for any {l,v) 
such that V ^ Z and 1 < I < a, we have 

Also, Fa and F^ are infinite diagonal matrices such that for all I ^ Z 

=Pa{-f + li), 

Proof: See Appendix |H-E| ■ 
Define G^ such that its {a, (3) subblock is G^G^*, and G^ such that its {a, 13) subblock is G^G^*. 
Proceeding similarly as in the proof of Theorem [3] we obtain 

T (G^) = afcF^F^* and T (g^^ = aA:F^F^F^*F^*, 

where F^ contain M x 1 submatrices. The {a, 1) submatrix of F^ is given by F^F^. 
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For any continuous function g{x), Theorem 5.4] implies that 

naM 



A-;iE9(A.{(G')-G'(G-.)-i}) 

i=l 

/is. aM 
(^^ (^(F^F^*)-2 F^F^F^*F^* (F^F^*)-2^) d/. 



'2a 1=1 

Therefore, capacity of parallel channels, achieved via water-filling power allocation, yields 

^ naM 
.is ^ aM 



(90) 



[log(^A, (^(F^F^*)-2 F^F^F^*F^*(F^F^*)-2^)]^d/, (91) 



1=1 

where the water level v can be computed through the following parametric equation 

naM 



(92) 



P= lim ^r-V ^-Ai/(G'')-^G'*(G'')-H 

i=l 

//a aM _l_ 
5] - (^(FT"*)-^ F''F'^F'**F''* (F^F"*)-^)] d/. (93) 

Appendix D 
Proof of Corollary [2] 

Corollary [2] immediately follows from the following proposition. 

Proposition 5. The kth largest eigenvalue Xk of the positive semidefinite matrix F^F/^F^F* is bounded 
by 

^<\k<\k {"PhK) , l<k<M. (94) 
These upper bounds can be attained simultaneously by the filter 



Proof: See Appendix |G| ■ 
By extracting out the M frequencies with the highest SNR from each aliased set {/ — Ifs/M \ I G Z}, 
we achieve A^^^^ = (F/^F^), thus achieving the maximum sampled channel capacity. 

Appendix E 
Proof of Proposition^ 

It can be easily observed that Algorithm 1 keeps K subbands in total while zeroing out 
all others through filtering. Define R{f) = H{f)X{f) + N{f). In step 2 of the algo- 
rithm, the frequency response of the zth subband being chosen is a linear combination of 
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{H if - Ifg) X{f- If,) + N{f-lf,)\le {li, • • • , Ik}}. More specifically, 

K 

Y{f-hU)^A''+^-'R{f-hU)+ J2 BkRU-hU)^ 

k=i+l 

^ V ' 

residual 

where \B]^\ is either 1^4^+^"^ | or 0. Treating the residual term as noise, the SNR at the zth branch is 

lim SNR, - l^^^-^^^^^l' 



A^oo 5^ (/ - lifp) 

Thus, for large A, this sampling method extracts out K subbands of highest SNR, and suppresses all 
other subbands. 

Now we need to prove that this is optimal. For any given /, modulation and filtering act as two 
right-invertible operators on both the signal and the noise. After noise whitening, the equivalent channel 
matrix is given by 

where (F^F^*)~2 f^ is of K rows orthonormal to each other and is a diagonal matrix. This implies 
that modulation along with filtering also plays the role of projecting the frequency components onto a K 
dimensional subspace, albeit with respect to the modulated aliased set which is larger than the original 
aliased set. Applying the same proof as for Corollary [2} the optimal method is to extract out K subbands 
with the highest SNR, which coincides with the asymptotic performance of the sampling method derived 
by Algorithm 1. 

Appendix F 
Proof of PropositionH] 

Denote by y^{t) the analog signal after passing through the k^^ prefilter prior to ideal sampling. When 
both the input signal x{t) and the noise r]{t) are Gaussian, the MMSE estimator of x{t) from samples 
{y^[n] \ 1 < k < M} is linear. Recall that fg = MTg and fs = fs/M. A linear estimator of x{t) from 
y[n] can be given as 

where we use the vector form g{t) = [g^{t),--- ,g^{t)f and y{t) = [y^{t),--- ,y^{t)f for 
notational simplicity. Here, {t) denotes the interpolation function operating upon the samples in the & 
branch. We propose to find the optimal estimator g(t) that minimizes the mean square estimation error 

E(\x{t) -x{t)\^) for some t. 
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From the orthogonality principle [ |50| , the MMSE estimate x{t) obeys 



(96) 

Since x{t) and r]{t) are both stationary Gaussian processes, we can define Rxy(^) •= E (x(t)y*(t — r)) 
to be the cross correlation function between x{t) and y(t), and Ry(T) := E (y(t)y*(t — r)) the 



autocorrelation function of y(t). Plugging ([95|) into (96) leads to the following relation 



(97) 



Replacing t hy t + ITg , we can equivalently express it as 

^XY{t) = 5^ + ifs - kfs) Ry (kfs - ifs) = 5^ g^ - ITs) Ry (/T, ) , (98) 



which is equivalent to the convolution of g(t) and Ry(t) • XI^gz 

Let ) denote Fourier transform operator. Define the cross spectral density Sxy(/) ^(Rxy(^)) 



and Sy(/) = J^(Ry {t)). By taking the Fourier transform on both sides of (98) , we have 



which immediately yields 



G(/) = Sxy(/) 



T' 2" 
(99) 



Since the noise rj{t) is independent of x{t), the cross correlation function Rxy(t) is 

Rxy (r) = E + r) [(si * /i * x)* (f), • • • , (sm * /i * a;)* [t)]) , 

which allows the cross spectral density to be derived as 

SxY{f)^H*U)Sx{f)[Sl{f),--- ,S*mU)]. (100) 

Additionally, the spectral density of y{t) can be given as the following M x M matrix 

Sy(/)= (|i?(/)|2 5x(/) + 5,(/))s(/)S*(/), (101) 

with Srj{f) denoting the spectral density of the noise r]{t), and S(/) = [Si{f), • • • , Sm{f)]^- 

Define K(/) := Ziez {\H{f - Ifs)? Sx{f - Ifs) +W - Us)) S(/-//,)S*(/-I/,). The Wiener- 
Hopf linear reconstruction filter can now be written as 



G{f) = H*{f)Sx{f)S*{f)K-\f) 
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Define Rxir) = E {x{t)x*{t - r)). Since Sx{f)df = RxiO), the resulting MSE is 



^(i)=E(|x(t)|2) -E(|x(t)p) =E(|x(t)|2) -E(x(i)£*(i)) 
= RxiO) - E ^a;(t) (^Eg^(* - ^r«)yar«)^ ^ 
= Rx (0) - ^Rxy(t - /r,)g(t - ITs). 



(102) 
(103) 
(104) 



Define ((/) := |i?(/)<Sx(/)|' S*(/)K-i(/)S(/). Since Tig{-t)) = (G*(/))^ and Sxy 
H*{f)Sx{f)S*{f), Parseval's identity implies that 



/OO 
[Sx{f)-G*{f)S^Y] d/ 
-OO 

/OO |- -. 
Sxif) - \H{f)SxU)fS*if)K-\f)S{f) df 
-OO 

-It 



fs/2 
fs/2 



E Sx{f-ifs)-tvjif,h)-i 



J= — OO 



d/. 



Suppose that we impose power constraints J^iez'^^if ~ ^fs) ~ ^(/)' define ({f) := 
|i7(/)5x(/)P S*(/)K~^(/)S(/). For a given input process x{t), the problem of finding the optimal 
prefilter S(/) that minimizes MSE then becomes 



maximize ^Jif, fs) ' 



where the objective function can be alternatively rewritten in matrix form 



trace {f|f;;f: (F, (F;,F;; + F^) Fl)-^ FsFhFj,] 



(105) 



Here Fx and F^ are diagonal matrices such that (Fx)/ 1 = Sx{f — Ifs) and (F^)^ ^ = <5^(/ + kfs). We 
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observe that 

trace {f|f^F: (F, (F^F^ + F^) F,*)"^ F,F^f|} (106) 

=trace {(F, {F^Fl + F^) F,*)"^ F^F/^F^F^F,*} (107) 

^=We {(YY*)-^ Y (F^F^ + F^)"^ F^F^F^ (F^F^ + F^)-^ Y*} (108) 

^=We {(F^F^ + F^)-^ F^FxF^Y* (YY*)"^ y} (109) 

^=We {(F^F^ + F^)-^ F^FxF^Y*y} (110) 

< sup trace |z(F^F^ + F^)-^F^FxF^Z*j (111) 

Z*Z=Im ^ 

M 

= 5^A,(D), (112) 

i=l 

where (a) follows by introducing Y := F^ (F/^F^ + F^)2, (b) follows from the fact that Fh, Fx, F^ 
are all diagonal matrices, (c) follows by introducing Y = (YY*)~2 Y, and (d) follows by observing 
that YY* = (YY*)"2 yY* (YY*)"^ = i. Here, D is an infinite diagonal matrix such that F>i^i = 
I rrr ^ '^7^^/'ro'^/l 7 ^ N • In Other words, the upper bound is the sum of the M largest F>ii which 

are associated with M frequency points of highest SNR (f+ifs) ^ 

Therefore, when restricted to the set of all permutations of {Sx{f)^Sx{f ± /g), • • • }, the minimum 
MSE is achieved when assigning the M largest Sx{f + Ifs) to M branches with the largest SNR. In 
this case, the corresponding optimal filter can be chosen such that 

{1, if/ = fc 
(113) 
0, otherwise. 

where k is the index of the k^^ largest element in — lfs)\^ /^vif ~ ^fs) • ^ ^ 

Appendix G 
Proof of Proposition^ 

Recall that at a given /, Fh is an infinite diagonal matrix satisfying {Fh)i i = H (^f — for all 
/ G Z, and that F^ = (F5F*)"2 F^. Hence, F^F/^F^F* is an M x M dimensional matrix. We observe 
that 

F, (f,)* = (f,f:)-^ f,f: (f,f:)-^ = i, (im) 
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which indicates that the rows of are orthonormal. Hence, the operator norm of is no larger than 
1, which leads to 



Ai F,F,F^F: = 



F.F^ 



< 



(115) 



Denote by {e/^, A: > 1} the standard basis where e^^^ is a vector with a 1 in the fcth coordinate and 
otherwise. We introduce the index set {ii, ^2, • ' ' ? ^m} such that e^^ (1 < < M) is the eigenvector 
associated with the fcth largest eigenvalues of the diagonal matrix F/^F^. 

Suppose that v/^ is the eigenvector associated with the fcth largest eigenvalue Xk of F^F/^F^F*, 
and denote by (^f's)^ ^he fcth column of F^. Since F^F/^F^F* is Hermitian positive semidefinite, its 
eigendecomposition yields an orthogonal basis of eigenvectors. Observe that {vi,--- ^vj.} spans a k- 
dimensional space and that | (Fs^ — i| spans a subspace of dimension no more than k — 1. 

For any k >2, there exists k scalars ai, • • • ^a^ such that 



a^^r}j ± | (f,)^ , 1 < j < A: - i| and ^ a,v, ^ 0. 



(116) 



This allows us to define the following unit vector 



k 



1 \/Ei=i 



(117) 



which is orthogonal to < ( Fg ) ,1 < j < k — 1>. We observe that 



F,F,F;;F:vfc 



E 



f,f,f;;f:v. 



E 



aiXi 



A?|a,f 



= E 

1=1 Ej=i Wj 



>A 



(118) 



(119) 



(120) 



Define := F*Vfc. From (116) we can see that (u^ 



holds for all 
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G {ii, ^2, • • • 5 h-i}- In other words, u^^ -L {e^^, • • • , ei^_^}. This further implies that 



F,F^F;;F:vfc 



< 



< sup ||F;,f;;x||2 



by observing that F^F^ is a diagonal matrix. 
Setting 



Sk f 



M 



1, 



if 



Hif 



Us 

M 



= (F,(/)F*(/)), 



0, otherwise, 

yields = and hence F^F/^F^Fs is a diagonal matrix such that 

F,F,F;;F,)^^ = Afc (f.f;;). 

Apparently, this choice of Sk{f) allows the upper bounds 

\k (f,F;,f;;f:) = (f^^f;;) , vi < < m 

to be attained simultaneously. 

Appendix H 
Proofs of Auxiliary Lemmas 

A. Proof of Lemma^ 

Suppose that T ^ nT^ + Tq where < To < T^. Then 

lsnpI{x{0,T];{y[n]}) = "^^^ sup I {x{0,T]; {y[n]}) 



> 



T nTs 
T nT, 



supl {x{0,nTs];{y[n]}) . 



Similarly, we have 
1 

f 



(121) 
(122) 
(123) 

(124) 



(125) 



(126) 



(127) 



(r, \ 1 

sup/(x(0,r];{y[n]}) < ^ \^ , sup/(x(0, (n + 1)T,]; {y[n]}) . 



T (n + 



Combining the above inequalities we obtain 

nTs 1 



lim 



supI{x{0,nT,];{y[n]}) < lim 



1 



n-s-oo nTg + To 

(n + 1) 1 



< lim 



n^oo T (n + 1) T, 



sup I {x{0,nT, + To]- {y[n]}) 

sup/(x(0, {n + l)Ts];{y[n]}). 
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Since the transmission time T is an integer multiple of T^, the lower and upper bounds in the above 
inequality equal the channel capacity, which immediately leads to the limit when T is not an integer 
multiple of T^: 

lim \ supJ(x(0,nr, + ro];{y,[n]})^ lim ^ sup / (x(0, nT,]; {y,[n]}) . (128) 

n^oo nls -\- J-O n^oo nls 

Since Tq can be arbitrarily chosen from the interval (O^Tg), we conclude that 

lim lsup/(x(0,r];{y,[n]}) = lim ^ sup / (x(0, nT,]; {y,[n]}) . (129) 



B. Proof of Lemma^ 
For any i < j, we have 





< 







J2 ^j-i+t^l 



t=—oo 



+ 



J2 ^j-i+tK 
t=n-j+l 



(130) 



Since h{t) is absolutely summable and Riemann integrable, for sufficiently small A, there exists a constant 



c such that J2 



i=—oo 



< c. In the following analysis, we define and to capture the two residual 



terms respectively, i.e. 



t=n-j+l 



t=—oo 



By our assumptions, we have h{t) = o{t~^) for some constant e > 1. Since s{t) is also absolutely 
integrable, h{t) = o{t~^) holds as well. Without loss of generality, suppose that j > i, then we have 
(a) if z > n^, by the assumption /i(t) = o (^) for some e > 1, we have 

-j 



t=—oo 

< max 

< max 

< c max 
— kc' o 



h-r 

h-r 
1 



/ t=—oo 



t=—oo 
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(b) if j > n2e, 



t=—oo 



< 



■j-i+t 



\t=—oo 



max 

1 / r<-j 



< c max h 

r>n^ 

c • o I ^ I = o 

m J \ \/n 



(c) if j < n2e and i < n^e^ Cauchy-Schwartz inequality yields 

\ 2 



+ 00 



\t=—oo 



+00 



+00 



< 



E h 



<t=—oo 



Hence, we have 



lim -VV|R,U^< lim - + 2n^+^o f-^ 

1=1 j=l ^ ^ 



0. 



Similarly, we can show that 



lim -||r2||^ = o, 

n— ^oo Tl 



which immediately implies that 



lim — 



Since is a Toeplitz matrix, applying Lemma 6] and Section 4.1] yields 



oo oo 

i=0 t=—oo 
+00 oo 

^2 E 11^*11 Ell^^+t 



< 2c- 



t=—oo 
2 



i=0 



(131) 

(132) 
(133) 
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Additionally, since is a block Toeplitz matrix, |52 Corollary 4.2] allows us to bound the norm as 

2—11 n\oo 
k-1 





= H" 




2 II II 



oo /k—1 
j=0 \i=0 



oo 



Hence, by definition of asymptotic equivalence, we have ~ H"H"* 



C. Proof of Lemma^ 

We know that S"S"* = S", hence, ~ = S"S"*. Recall that f S") = Et=-oo Si-i+ts« • For 
a given k, the Fourier series related to {C^} can be given as 



oo / oo 



z=— OO oo / 

When k is sufficiently large, the Riemann integrability of s{t) implies that 

oo / ^+oo 



We observe that 



sit + T)s{tyd?j exp (^j^T^ dr 
J s{t + r) exp (^j^ (t + r)^ dr^ (^^^ 



UJ 



s(t)exp ( i^t ) dt 



5 -i 



UJ 



(134) 



F^{uj)^AY^ I / s(t + ir,)s(t)*dt I exp(jiw) (135) 
i=-oo ^ 

= A y (^y" s{t + T)s{tr d?j (^f2 Hr- iTs)^ exp (^j^r^ dr. (136) 



Since F^(ijj) corresponds to the Fourier transform of the signals obtained by uniformly sampling 
j^tx, r)s(t)*dt, we can immediately see that 

2 



A 



(137) 
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If for all uj G [— tt, tt], we have 



E 



> e, > 



(138) 



for some constant e^, then (Jmin (C^) = inf^ ^ci^) — which leads to (C^) ^ 



< 



Let = - S^S^*. Since S^S^* - C^, we can have lim^^oo ||S^||^ = 0, which implies that 



lim 



< lim 



I I 



F n^oo 

The Taylor expansion of (S^S^*)~^ yields 



(C") 



n\-l 



T 1 

< lim ^^||2"|L = 0. 

2 n^oo Ae. \ n 



Hence, we can bound 
1 



lim 



n\-\ 



< lim 

F n^oo 



(139) 

(140) 
(141) 

0. (142) 



D. Proof of Lemma^ 

Since (C")~2 and (^S"^ ^ are botli Hermitian and positive semidefinite, we have (C")~2 ^ 

_ 1 _ i 

We observe that (C") 2 — U^Ac ^U*, which is also a circulant matrix. Combining the above results 
with Lemma |2] yields 



(S"S" 



(143) 



where (C")~2 and H" are both Hermitian Toeplitz matrices. Denote by Fc^^ (lo), Fc (lo), Ff^ (lo), (lo) 

the Fourier series related to (C")^ , , and H", respectively. We note that Fc„ ^{uj), Fc{uj) and 

Ff^ (w) are all scalars since their related matrices are Toeplitz, while (w) is a 1 x vector since H is 

block Toeplitz. Then for any continuous function g{x), applying |51, Theorem 12] yields 

1 " 



lim 



1=1 



- lim 

n— ^00 u 



1=1 



H" {C" 



^i-^Es{^.((c"r'H")} 



27r 
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We observe that is asymptotically equivalent to H^H^*, and the eigenvalues of H^H^* are exactly 
the square of the corresponding singular values of H^. Hence, we know from [ [52| that for any continuous 
function g{x)'. 

A- it 4'- («") } = ito{'' («") } = tfj (^'.f'^") 

i=l i=l 

where F^(cj) can be expressed as F^(cc;) = o(^)' ' ' ' ' -^hk-i(^ 



Here, for any < i < A:: 



+00 



+00 



The above analysis implies that {uj) = (F^(cc;)). 
Through algebraic manipulation, we have that 



UCjJ) 



A 



+00 



Fh,ii^)-r E H{-f + lf,)exp{-j27r{f-lf,)iA), 



which yields 



k-i 



i=0 



A 



2 k-l 



A2 



i=0 
k-1 



+ CX) 



5^ ^ (-/ + exp {-j2n if - Ifs) iA) 



l=—oo 



^ E E ^ + ^1/^) ^* + ^2/«) exp (-i27r (l2 - ^i) /.iA) 

* i=0 h,l2 



A2 

2^2 



A 



J2H{-f + hfs)H^-f + hfs) 



-k-i 



,2=0 



^exp I -j27r (/2 - /i) 



7^2 
A 



+00 



\H{-f + lfs)S{-f + lfs)?. 



l=—oo 



Similarly, we have 



A 



+00 



Fc{f) = ^ J2 \S{-f + lfs)\' 



Combining the above results yields 

1 ^ 

lim - y 5 \Xi ( (S"S")-^ H"H"* (S"S"*)-^) } 
n^oo n ^ — ^ L V /J 



1=1 



f^'^ ( j:t^-o.\H{-f + ifs)s{-f+ifs)\' 



d/ 



(145) 
(146) 
(147) 

(148) 
(149) 
(150) 

(151) 

(152) 
(153) 
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This completes the proof. 



E. Proof of Lemma^ 

Denote by the Fourier symbol associated with the block Toeplitz matrix G^. We know that the 
Fourier transform of (t, r) with respect to r is given by 

j g"/ {t,T)exp{-j27TfT)dT 

= / / Si{t- Ti) Qi (ti) Pi (ti - T2) exp (-j27r/T2) dTidT2 

^ j Pi (n - T2) exp (j27r/ (ri - T2)) dT2 / Si{t- n) (n) exp (-j27r/Ti) dri 
(-/) exp (-i27rt/) • J] 0^(5 (/ - 



-P^ (-/) 



^^'i (-/) 



J] c'tSi (-/ + n/,) exp (-j27rt (/ - 



For any (/, m) such that 1 <l < a and 1 < m < a/c, the (/, m) entry of the Fourier symbol can be 
related to the sampling sequence of ga [iTg^ at a rate ^ with a phase shift mA, and hence it can be 
calculated as follows 



c^S (-/ + n/, + t;^) exp (-i27rir, - n/, - t;^) ) 

Using the fact that Yl^Zl ^^P (^i27r ^(i;2 — i^i) mA^ = aA;5 ['L'2 — vi], we get through algebraic 
manipulation that 

^K^*)^ ^ = afe 5^ P« f-/ + v^A J2 <^aSa (-/ + u,f, + t;^] exp (-j2nlt (f - uif, - 

5^ c^^5^ (^-/ + U2f, + v^^ exp l^-j2nlt (/ - ^^2/, - ) ) 



K 



Define another matrix such that 



It can be easily seen that 



Replacing by P^^^ immediately gives us the Fourier symbol for G^G|^*. 
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